Balamurgan Chirtsabesan
(balamc@cs.arizona.edu)
Tapas Ranjan Sahoo
(tapas@cs.arizona.edu)
This algorithm is a static watermark algorithm.
The Robust Object Watermarking presents a new approach to
watermarking. Instead of applying the watermarking scheme to the raw
code directly, a new vector representation of the code is created. The
basic idea revolving around this scheme is that instead of considering
the overall structure of the code and its control flow, the code is
viewed as a statistical object. The frequencies of groups of
instructions in the entire code are taken into consideration in creating
a new vector representation of the data. Spreading the watermark
throughout the target code ensures a large measure of security against
intentional and unintentional attacks.
The watermark value is passed as a string parameter, which is converted into a frequency vector representation and embedded in the code. Each vector element represents an instruction group which is identified through some profiling information. Various embedding techniques such as code substitution, instruction group insertion, etc are used to build the watermark in the code. The basic idea is increase the frequency of the instruction groups in the code which form a component of the watermark vector. The embedding module references the 'CodeBook' to carry out different types of embedding.
RESTRICTIONS:
The algorithm is usually applicable over a large application where the
code size is large enough to embed the watermark vector. Embedding new
instructions in a code is a sensitive issue since it must satisfy to lot
of criterias such as maintaining proper stack size, proper variable
intialization and use, etc. In very small applications, there might not
be enought scope to carry out the entire vector instruction embedding,
especially the approaches other than code substitution. The algorithm
uses a "decision procedure" to make sure that the implementation does
not go into an infinite loop. Incase there is no further scope for
embedding, the program exits with a log message saying that the
watermark embedding was not completed.
As a simple example, let us briefly describe a simple substitution procedure for embedding a vector instruction group. Say, iload; isub is a two element instruction group that is a watermark vector component. The 'CodeBook' stores a substitution group corresponding to it as: A{ iload X, iload Y,if_icmpne -> Z } --> B{ iload X, iload Y,isub, ifne -> Z}. Hence, finding an occurence of A in the code and substituting it by a semantically equivalent group B, effectively increase the frequency of the vector component by one.
The recognition procedure follows a different approach. Instead of retrieving the watermark directly from the target code, the recognizer is provided with both the original code and the new code along with the watermark. The recognizer then answers with a 'yes/no' whether the particular watermark exists in the code or not.
EMBEDDING THE WATERMARK:
Input the original jar file (eg. A.jar). We have the watermarked jar
file as A_wm.jar. Enter the watermark value in the 'watermark' field as
well as in the 'key' field. The watermark must be an integer with
8 or fewer digits.
WATERMARK RECOGNITION:
Input the watermarked jar file (ie. A_wm.jar) and the original jar file
(i.e. A.jar). Enter the watermark, which
you wish to detect, in the 'key' field. The recognizer then outputs
"WATERMARK FOUND/ WATERMARK NOT FOUND" based on whether the particular
watermark was detected in the target code (ie. A_wm.jar) or not. The
recognizer extracts the watermark within a certain threshold value. The
default 'recognition threshold' is set to 1. This threshold level can be
changed in the 'myRecognitionThreshold' field of the 'Config' class.