By Bar Ofner | March 16, 2022

YARA rules are an essential tool for security researchers that help them identify and classify malware samples. They do so by describing patterns and strings within malware code that can help an analyst identify known or new threats. YARA rules are also often integrated within commercial detection tools, or used internally to detect misbehaving binaries on the enterprise network.

But what if we don’t have the malicious file we are writing rules for? What if we want to create a “malware” file based on YARA as input? Today, Team82 is making freely available via our Github repository a tool we call Arya that does just that. Arya can be used to generate custom-made, pseudo-malware files to trigger antivirus (AV) and endpoint detection and response (EDR) tools just like the good old EICAR test file. Arya has a number of use cases, including malware research, YARA rule QA testing, and pressure testing a network with code samples built from YARA rules. 

You can get Arya here.

 

Demonstrating Arya generating a pseudo-malicious file based on 60-plus YARA rules.

What is Arya?

Arya is a first-of-its-kind tool; it produces pseudo-malicious files meant to trigger YARA rules. The tool reads the given YARA (.yar suffix) files, parses their syntax using Avast’s yaramod package—the YARA parsing engine used in this research—and builds a pseudo “malware” file, carefully placing desired bytes from the YARA rules to trigger the input rules. 

Arya is a unique tool that produces pseudo-malicious files meant to trigger YARA rules.

 

The goal of the tool is to generate a tailor-made pseudo-malicious file that detection sensors such as AV or EDR will identify as the malware file an input YARA rule is meant to detect. To achieve this goal, not only are we are adding the necessary signatures, strings, and bytes from the input YARA rules, but also adding some “touches” such as real PE headers, increasing the outfile entropy, adding x86 bytecode, and function prologue/epilogue assembly code. All of this helps the AV/EDR triggering process, and bypasses some heuristics checks they might have.

Arya takes YARA rules as input and generates a pseudo-malware file that triggers all the input rules.


Researchers may also use Arya to generate pseudo-malicious files using YARA rules as building blocks. These files can be used to build files that will be identified as specific malware. For example, if you don’t have a Zeus malware sample, but you want to check how your AV reacts to it? No problem, load the Zeus YARA rules to Arya and generate your own “Zeus”-like pseudo-malware that AV/EDR tools will identify as Zeus.

Arya can also be used as part of incident response training—similar to purple-teaming—where pseudo-malicious files can be sent across the network to pressure test sensors and detectors in the network.

What’s Currently Supported—And What’s Not

Arya currently supports the following:

Supported 

  • Types
    • Strings – ASCII and Wide
    • Hex Streams (including Jumps and Alternations)
  • Arya functionalities
    • At operator
    • Int Functions (uint32, int16be, etc.)
    • Of operator (e.g. all of ($s*))

Planned for future updates 

  • RegEx type support
  • Base64 type support
  • FileSize Operator
  • Range Operator
  • String Count Operator 

Arya in Action

YARA files (.yar extensions) are text files that contain one or more YARA rules. In this project, we used yaramod to turn YARA rules into a list of rules represented by Python objects, which can then be used to access the internal contents of a rule, such as strings, types, conditions, and more. Yaramond parses YARA rules into AST, or an Abstract Syntax Tree.

Traversal of the abstract syntax tree is done by using a combination of the Observer and Visitor design patterns. For every node that code goes through, it will determine which bytes and strings it needs to place, and where, in order to trigger the condition in the subtree it traverses. As a result, this tree will produce a mapping of strings and possible offsets to put them in; it can also reserve some of them in the file, which will be passed to the placer mechanism for further processing.

For example, below is a simple YARA rule with a few conditions:

In the AST parsing engine, this will be represented as:

An AST tree representing the above YARA rule.

 

And finally the mapping will look something like:

The table, above, closely portrays the mapping representation in the code. The only difference is that instead of the string representations, internal yaramod types are used. These can be used to easily get the yara string type (e.g. Hex, Base64, Plain Wide, etc.), and also provide a parsed version of the string which is easier to handle.

A representation of a result file generated by Arya.

 

After the table is produced for every rule in the file, it is passed to the placer which will decide where the best place for each mapping record, considering the minimum and maximum offsets, pre-reserved spots by the AST, the types of the string, and the YARA action in the condition. The placer has its own data structure that manages the byte stream of the final file, as well as reserving spots, and filling the empty sections in the file.

Arya’s main purpose is to trigger malware detection engines. In order to do that, we have to consider some things that matter in the internal workings of these AV and EDR engines.

The first consideration is native bytecode entropy. This is important because currently antivirus software measures the entropy of the file’s code in order to check if it is packed, encrypted, or obfuscated. Sometimes it can also be used for comparisons with other malicious items. 

The “empty holes” in the output file where none of the rules specify what to place in them are filled with x86 bytecode from another malware. The user may specify any malware/file they want to take the byte code from (Using the option -m and specifying the file name).

Another consideration here are x86 functions, in particular, function counts. Today, a wide variety of antivirus and detection software, might check the number of functions in a potentially malicious file. A file with more functions than a standard number could be considered malicious. Therefore, it was decided to add as many function prologues and epilogues as possible, while retaining their correct order in the code. 

Example function x86 prologue and epilogue:

We decided to consider them in the following way:

Where the two question marks (??) are a randomly generated number divisible by 4, encoded as a byte.

Both of these additions will make the file seem more like a real malware file. Doing so would make Arya output pseudo-malicious files trigger more antivirus software and detection engines which will detect it as a real malware file.

Conclusion 

Today’s release of Arya gives security researchers, network analysts, and incident response teams an effective tool to test YARA rules, their software and themselves. YARA rules are an essential means of classifying and identifying malware samples. And Arya is a means by which organizations can test the security of their networks, train their IR teams and also improve their defense tools and software. 

We invite you to download Arya from our Github repository, and join our Team82 Research Slack community to discuss it, share best practices, and success stories with our research team and peers.