Skip to main content

Function name mangling for C++ and Java

G++ internals - Mangling


Both C++ and Jave provide overloaded function and methods, which are methods with the same types but different parameter lists. Selecting the correct version is done at compile time. Though the overloaded functions have the same name in the source code, they need to be translated into different assembler-level names, since typical assemblers and linkers cannot handle overloading. This process of encoding the parameter types with the method name into a unique name is called name mangling. The inverse process is called demangling.

It is convenient that C++ and Java use compatible mangling schemes, since the makes life easier for tools such as gdb, and it eases integration between C++ and Java.

Note there is also a standard "Jave Native Interface" (JNI) which implements a different calling convention, and uses a different mangling scheme. The JNI is a rather abstract ABI so Java can call methods written in C or C++; we are concerned here about a lower-level interface primarily intended for methods written in Java, but that can also be used for C++ (and less easily C).

Method name mangling

C++ mangles a method by emitting the function name, followed by __, followed by encodings of any method qualifiers (such as const), followed by the mangling of the method's class, followed by the mangling of the parameters, in order.

For example Foo::bar(int, long) const is mangled as `bar__C3Fooil'.

For a constructor, the method name is left out. That is Foo::Foo(int, long) const is mangled as `__C3Fooil'.

GNU Java does the same.

Primitive types

The C++ types int, long, short, char, and long long are mangled as `i', `l', `s', `c', and `x', respectively. The corresponding unsigned types have `U' prefixed to the mangling. The type signed char is mangled `Sc'.

The C++ and Java floating-point types float and double are mangled as `f' and `d' respectively.

The C++ bool type and the Java boolean type are mangled as `b'.

The C++ wchar_t and the Java char types are mangled as `w'.

The Java integral types byte, short, int and long are mangled as `c', `s', `i', and `x', respectively.

C++ code that has included javatypes.h will mangle the typedefs jbyte, jshort, jint and jlong as respectively `c', `s', `i', and `x'. (This has not been implemented yet.)

Mangling of simple names

A simple class, package, template, or namespace name is encoded as the number of characters in the name, followed by the actual characters. Thus the class Foo is encoded as `3Foo'.

If any of the characters in the name are not alphanumeric (i.e not one of the standard ASCII letters, digits, or '_'), or the initial character is a digit, then the name is mangled as a sequence of encoded Unicode letters. A Unicode encoding starts with a `U' to indicate that Unicode escapes are used, followed by the number of bytes used by the Unicode encoding, followed by the bytes representing the encoding. ASSCI letters and non-initial digits are encoded without change. However, all other characters (including underscore and initial digits) are translated into a sequence starting with an underscore, followed by the big-endian 4-hex-digit lower-case encoding of the character.

If a method name contains Unicode-escaped characters, the entire mangled method name is followed by a `U'.

For example, the method X\u0319::M\u002B(int) is encoded as `M_002b__U6X_0319iU'.

Pointer and reference types

A C++ pointer type is mangled as `P' followed by the mangling of the type pointed to.

A C++ reference type as mangled as `R' followed by the mangling of the type referenced.

A Java object reference type is equivalent to a C++ pointer parameter, so we mangle such an parameter type as `P' followed by the mangling of the class name.

Qualified names

Both C++ and Java allow a class to be lexically nested inside another class. C++ also supports namespaces (not yet implemented by G++). Java also supports packages.

These are all mangled the same way: First the letter `Q' indicates that we are emitting a qualified name. That is followed by the number of parts in the qualified name. If that number is 9 or less, it is emitted with no delimiters. Otherwise, an underscore is written before and after the count. Then follows each part of the qualified name, as described above.

For example Foo::\u0319::Bar is encoded as `Q33FooU5_03193Bar'.

Templates

A class template instantiation is encoded as the letter `t', followed by the encoding of the template name, followed the number of template parameters, followed by encoding of the template parameters. If a template parameter is a type, it is written as a `Z' followed by the encoding of the type.

A function template specialization (either an instantiation or an explicit specialization) is encoded by an `H' followed by the encoding of the template parameters, as described above, followed by an `_', the encoding of the argument types template function (not the specialization), another `_', and the return type. (Like the argument types, the return type is the return type of the function template, not the specialization.) Template parameters in the argument and return types are encoded by an `X' for type parameters, or a `Y' for constant parameters, and an index indicating their position in the template parameter list declaration.

Arrays

C++ array types are mangled by emitting `A', followed by the length of the array, followed by an `_', followed by the mangling of the element type. Of course, normally array parameter types decay into a pointer types, so you don't see this.

Java arrays are objects. A Java type T[] is mangled as if it were the C++ type JArray. For example java.lang.String[] is encoded as `Pt6JArray1ZPQ34java4lang6String'.

Table of demangling code characters

The following special characters are used in mangling:

`A'
Indicates a C++ array type.
`b'
Encodes the C++ bool type, and the Java boolean type.
`c'
Encodes the C++ char type, and the Java byte type.
`C'
A modifier to indicate a const type. Also used to indicate a const member function (in which cases it precedes the encoding of the method's class).
`d'
Encodes the C++ and Java double types.
`e'
Indicates extra unknown arguments ....
`f'
Encodes the C++ and Java float types.
`F'
Used to indicate a function type.
`H'
Used to indicate a template function.
`i'
Encodes the C++ and Java int types.
`J'
Indicates a complex type.
`l'
Encodes the C++ long type.
`P'
Indicates a pointer type. Followed by the type pointed to.
`Q'
Used to mangle qualified names, which arise from nested classes. Should also be used for namespaces (?). In Java used to mangle package-qualified names, and inner classes.
`r'
Encodes the GNU C++ long double type.
`R'
Indicates a reference type. Followed by the referenced type.
`s'
Encodes the C++ and java short types.
`S'
A modifier that indicates that the following integer type is signed. Only used with char. Also used as a modifier to indicate a static member function.
`t'
Indicates a template instantiation.
`T'
A back reference to a previously seen type.
`U'
A modifier that indicates that the following integer type is unsigned. Also used to indicate that the following class or namespace name is encoded using Unicode-mangling.
`v'
Encodes the C++ and Java void types.
`V'
A modified for a const type or method.
`w'
Encodes the C++ wchar_t type, and the Java char types.
`x'
Encodes the GNU C++ long long type, and the Java long type.
`X'
Encodes a template type parameter, when part of a function type.
`Y'
Encodes a template constant parameter, when part of a function type.
`Z'
Used for template type parameters.

The letters `G', `M', `O', and `p' also seem to be used for obscure purposes ...

Comments

Anonymous said…
I relish, lead to I discovered just what I was
having a look for. You've ended my four day long hunt! God Bless you man. Have a nice day. Bye

my blog - jocuri poker online
Anonymous said…
That is a very good tip particularly to those fresh to the
blogosphere. Brief but very precise information… Thank you for sharing this one.
A must read post!

My webpage; jocuri online avioane
Anonymous said…
Please let me know if you're looking for a writer for your blog. You have some really good articles and I feel I would be a good asset. If you ever want to take some of the load off, I'd love to write some articles
for your blog in exchange for a link back to mine. Please send me an email if interested.
Regards!

Also visit my blog post - bancuri poante
Anonymous said…
Hi there would you mind stating which blog platform you're using? I'm
going to start my own blog in the near future but I'm having a hard time making a decision between BlogEngine/Wordpress/B2evolution and Drupal. The reason I ask is because your design seems different then most blogs and I'm
looking for something completely unique.
P.S My apologies for being off-topic but I had to ask!

Also visit my web-site ... hindi Sex video
Anonymous said…
This website certainly has all the information
I needed about this subject and didn't know who to ask.

Check out my blog :: jocuri cu cu impuscaturi

Popular posts from this blog

MFC - Microsoft Foundation Classes Design Patterns

1 Introduction This paper describes the use of object-oriented software design patterns, as presented in Design Patterns: Elements of Reusable Object-Oriented Software by Gamma et al., within the Microsoft Foundation Class Library (MFC). MFC is used for implementing applications for Microsoft Windows operating systems. Because of the size of the MFC library, a complete analysis would have been beyond the scope of this assignment. Instead, we identified various possible locations for design patterns, using the class hierachy diagram of MFC, and studied the source carefully at these locations. When we did not find a pattern where we expected one, we have documented it anyway, with examples of how the particular problem could have been solved differently, perhaps more elegantly, using design patterns. We have included a brief introduction to MFC in Section 2 , as background information. The analysis has been split into three parts, with one section for each major design pattern ca...

Explain Polymorphism and Flavors of Polymorphism...

Polymorphism is the ability of different objects to react in an individual manner to the same message. This notion was imported from natural languages. For example, the verb "to close" means different things when applied to different objects. Closing a door, closing a bank account, or closing a program's window are all different actions; their exact meaning is determined by the object on which the action is performed. Most object-oriented languages implement polymorphism only in the form of virtual functions. But C++ has two more mechanisms of static (meaning: compile-time) polymorphism: Operator overloading. Applying the += operator to integers or string objects, for example, is interpreted by each of these objects in an individual manner. Obviously, the underlying implementation of += differs in every type. Yet, intuitively, we can predict what results are. Templates. A vector of integers, for example, reacts differently from a vector of string objects when it receives ...

• Why might you need exception handling be used in the constructor when memory allocation is involved?

Your first reaction should be: "Never use memory allocation in the constructor." Create a separate initialization function to do the job. You cannot return from the constructor and this is the reason you may have to use exception handling mechanism to process the memory allocation errors. You should clean up whatever objects and memory allocations you have made prior to throwing the exception, but throwing an exception from constructor may be tricky, because memory has already been allocated and there is no simple way to clean up the memory within the constructor.