Compiler Dependencies
Where can I download a free C++ compiler?
Check these out (alphabetically by vendor-name):
- Clang (LLVM)
- Digital Mars.
- DJGPP is a port of GCC 3.2 to DOS, and includes RHIDE, a DOS-based IDE.
- Embarcadero gives away a command line compiler.
- Microsoft Visual C++ command-line compiler (version 2003).
- MinGW has a port of GCC 3.2 that runs on MS Windows.
Also check out these lists:
Where can I get more information on using MFC and Visual C++?
Here are some resources (in no particular order):
www.visionx.com/mfcpro/
www.mvps.org/vcfaq
www.flounder.com/mvp_tips.htm
msdn.microsoft.com/archive/en-us/dnarvc/html/msdn_mfcfaq50.asp
How do I display text in the status bar using MFC?
Use the following code snipped:
CString s = "Text";
CStatusBar* p =
(CStatusBar*)AfxGetApp()->m_pMainWnd->GetDescendantWindow(AFX_IDW_STATUS_BAR);
p->SetPaneText(1, s);
This works with MFC v.1.00 which hopefully means it will work with other versions as well.
How can I decompile an executable program back into C++ source code?
You gotta be kidding, right?
Here are a few of the many reasons this is not even remotely feasible:
- What makes you think the program was written in C++ to begin with?
- Even if you are sure it was originally written (at least partially) in C++, which one of the gazillion C++ compilers produced it?
- Even if you know the compiler, which particular version of the compiler was used?
- Even if you know the compiler’s manufacturer and version number, what compile-time options were used?
- Even if you know the compiler’s manufacturer and version number and compile-time options, what third party libraries were linked-in, and what was their version?
- Even if you know all that stuff, most executables have had their debugging information stripped out, so the resulting decompiled code will be totally unreadable.
- Even if you know everything about the compiler, manufacturer, version number, compile-time options, third party libraries, and debugging information, the cost of writing a decompiler that works with even one particular compiler and has even a modest success rate at generating code would be significant — on the par with writing the compiler itself from scratch.
But the biggest question is not how you can decompile someone’s code, but why do you want to do this? If you’re trying to reverse-engineer someone else’s code, shame on you; go find honest work. If you’re trying to recover from losing your own source, the best suggestion I have is to make better backups next time.
(Don’t bother writing me email saying there are legitimate reasons for decompiling; I didn’t say there weren’t.)
Where can I get information about the C++ compiler from {IBM, Microsoft, Sun, etc.}?
In alphabetical order by vendor name:
- Clang (LLVM):
clang.llvm.org/
- Comeau C++:
www.comeaucomputing.com/
- Digital Mars (free) C++:
www.digitalmars.com/
- DJ C++ (“DJGPP”):
www.delorie.com/
- Edison Design Group C++:
www.edg.com/cpp.html
- GNU C++ (“g++” or “GCC”):
gcc.gnu.org/
. Note: there are two versions precompiled for Win32: Cygwin and minGW. - HP C++:
www.tru64unix.compaq.com/linux/compaq_cxx/index.html
- IBM VisualAge C++:
www.ibm.com/software/ad/vacpp/
- Intel Reference C++:
developer.intel.com/software/products/compilers/
- KAI C++:
developer.intel.com/software/products/kcc/
- Metrowerks C++:
metrowerks.com
orwww.metrowerks.com
- Microsoft Visual C++:
msdn.microsoft.com/visualc/
- Open Watcom C++ (an open-source follow-up to Watcom C++):
www.openwatcom.org/
- Portland Group C++:
www.pgroup.com
- Silicon Graphics C++:
www.sgi.com/developers/devtools/languages/c++.html
- Watcom C++:
www.sybase.com/products/archivedproducts/watcomc
[If anyone has other suggestions that should go into this list, please let us know. Thanks.]
What’s the difference between C++ and Visual C++?
C++ is the language itself, Visual C++ is a compiler that tries to implement the language.
How do compilers use “over-allocation” to remember the number of elements in an allocated array?
Recall that when you delete[]
an array, the runtime system magically knows how many destructors to
run. This FAQ describes a technique used by some C++ compilers to do this (the other common
technique is to use an associative array).
If the compiler uses the “over-allocation” technique, the code for p = new Fred[n]
looks something like the
following. Note that WORDSIZE
is an imaginary machine-dependent constant that is at least sizeof(size_t)
, possibly
rounded up for any alignment constraints. On many machines, this constant will have a value of 4 or 8. It is not a real
C++ identifier that will be defined for your compiler.
// Original code: Fred* p = new Fred[n];
char* tmp = (char*) operator new[] (WORDSIZE + n * sizeof(Fred));
Fred* p = (Fred*) (tmp + WORDSIZE);
*(size_t*)tmp = n;
size_t i;
try {
for (i = 0; i < n; ++i)
new(p + i) Fred(); // Placement new
}
catch (...) {
while (i-- != 0)
(p + i)->~Fred(); // Explicit call to the destructor
operator delete[] ((char*)p - WORDSIZE);
throw;
}
Then the delete[] p
statement becomes:
// Original code: delete[] p;
size_t n = * (size_t*) ((char*)p - WORDSIZE);
while (n-- != 0)
(p + n)->~Fred();
operator delete[] ((char*)p - WORDSIZE);
Note that the address passed to operator delete[]
is not the same as p
.
Compared to the associative array technique, this technique is faster, but more
sensitive to the problem of programmers saying delete p
rather than delete[] p
. For example, if you make a
programming error by saying delete p
where you should have said delete[] p
, the address that is passed to
operator delete(void*)
is not the address of any valid heap allocation. This will probably corrupt the heap. Bang!
You’re dead!
How do compilers use an “associative array” to remember the number of elements in an allocated array?
Recall that when you delete[]
an array, the runtime system magically knows how many destructors to
run. This FAQ describes a technique used by some C++ compilers to do this (the other common
technique is to over-allocate).
If the compiler uses the associative array technique, the code for p = new Fred[n]
looks something like this (where
arrayLengthAssociation
is the imaginary name of a hidden, global associative array that maps from void*
to
size_t
):
// Original code: Fred* p = new Fred[n];
Fred* p = (Fred*) operator new[] (n * sizeof(Fred));
size_t i;
try {
for (i = 0; i < n; ++i)
new(p + i) Fred(); // Placement new
}
catch (...) {
while (i-- != 0)
(p + i)->~Fred(); // Explicit call to the destructor
operator delete[] (p);
throw;
}
arrayLengthAssociation.insert(p, n);
Then the delete[] p
statement becomes:
// Original code: delete[] p;
size_t n = arrayLengthAssociation.lookup(p);
while (n-- != 0)
(p + n)->~Fred();
operator delete[] (p);
Cfront uses this technique (it uses an AVL tree to implement the associative array).
Compared to the over-allocation technique, the associative array technique is
slower, but less sensitive to the problem of programmers saying delete p
rather than delete[] p
. For example, if you
make a programming error by saying delete p
where you should have said delete[] p
, only the first Fred
in the
array gets destructed, but the heap may survive (unless you’ve replaced operator delete[]
with something that
doesn’t simply call operator delete
, or unless the destructors for the other Fred
objects were necessary).
If name mangling was standardized, could I link code compiled with compilers from different compiler vendors?
Short answer: Probably not.
In other words, some people would like to see name mangling standards incorporated into the proposed C++ ANSI standards in an attempt to avoiding having to purchase different versions of class libraries for different compiler vendors. However name mangling differences are one of the smallest differences between implementations, even on the same platform.
Here is a partial list of other differences:
- Number and type of hidden arguments to member functions.
- is
this
handled specially? - where is the return-by-value pointer passed?
- is
- Assuming a v-table is used:
- what is its contents and layout?
- where/how is the adjustment to
this
made for multiple and/orvirtual
inheritance?
- How are classes laid out, including:
- location of base classes?
- handling of
virtual
base classes? - location of v-pointers, if they are used at all?
- Calling convention for functions, including:
- where are the actual parameters placed?
- in what order are the actual parameters passed?
- how are registers saved?
- where does the return value go?
- does caller or callee pop the stack after the call?
- special rules for passing or returning structs or doubles?
- special rules for saving registers when calling leaf functions?
- How is the run-time-type-identification laid out?
- How does the runtime exception handling system know which local objects need to be destructed during an exception throw?
GNU C++ (g++) produces big executables for tiny programs; Why?
libg++ (the library used by g++) was probably compiled with debug info (-g
). On some machines, recompiling
libg++ without debugging can save lots of disk space (approximately 1 MB; the down-side: you’ll be unable to trace
into libg++ calls). Merely strip
-ping the executable doesn’t reclaim as much as recompiling without -g
followed
by subsequent strip
-ping the resultant a.out
’s.
Use size a.out
to see how big the program code and data segments really are, rather than ls -s a.out
which includes
the symbol table.
Is there a yacc
-able C++ grammar?
The primary yacc
grammar you’ll want is from Ed Willink. Ed believes his grammar is fully compliant with the ISO/ANSI
C++ standard, however he doesn’t warrant it: “the grammar has not,” he says, “been used in anger.”
You can get the grammar without action routines or
the grammar with dummy action routines. You can also
get the corresponding lexer. For those who are
interested in how he achieves a context-free parser (by pushing all the ambiguities plus a small number of repairs to
be done later after parsing is complete), you might want to read chapter 4 of his
thesis.
There is also a very old yacc
grammar that doesn’t support templates, exceptions, nor namespaces; plus it deviates
from the core language in some subtle ways. You can get that grammar here or
here.
What is C++ 1.2? 2.0? 2.1? 3.0?
These are not versions of the language, but rather versions of Cfront, which was the original C++ translator implemented by AT&T. It has become generally accepted to use these version numbers as if they were versions of the language itself.
Very roughly speaking, these are the major features:
- 2.0 includes multiple/
virtual
inheritance and pure virtual functions - 2.1 includes semi-nested classes and
delete[] pointerToArray
- 3.0 includes fully-nested classes, templates and
i++
vs.++i
- 4.0 will include exceptions
Is it possible to convert C++ to C?
Depends on what you mean. If you mean, Is it possible to convert C++ to readable and maintainable C-code? then sorry, the answer is No — C++ features don’t directly map to C, plus the generated C code is not intended for humans to follow. If instead you mean, Are there compilers which convert C++ to C for the purpose of compiling onto a platform that yet doesn’t have a C++ compiler? then you’re in luck — keep reading.
A compiler which compiles C++ to C does full syntax and semantic checking on the program, and just happens to use C code as a way of generating object code. Such a compiler is not merely some kind of fancy macro processor. (And please don’t email me claiming these are preprocessors — they are not — they are full compilers.) It is possible to implement all of the features of ISO Standard C++ by translation to C, and except for exception handling, it typically results in object code with efficiency comparable to that of the code generated by a conventional C++ compiler.
Here are some products that perform compilation to C (note: if you know of any other products that do this, please let us know):
- Comeau Computing offers a compiler based on Edison Design Group’s front end that outputs C code.
- LLVM is a downloadable compiler that emits C code. See also here and here.
- Cfront, the original implementation of C++, done by Bjarne Stroustrup and others at AT&T, generates C code. However it has two problems: it’s been difficult to obtain a license since the mid 90s when it started going through a maze of ownership changes, and development ceased at that same time and so it doesn’t get bug fixes and doesn’t support any of the newer language features (e.g., exceptions, namespaces, RTTI, member templates).
- Contrary to popular myth, as of this writing there is no version of g++ that translates C++ to C. Such a thing seems to be doable, but I am not aware that anyone has actually done it (yet).
Note that you typically need to specify the target platform’s CPU, OS and C compiler so that the generated C code will be specifically targeted for this platform. This means: (a) you probably can’t take the C code generated for platform X and compile it on platform Y; and (b) it’ll be difficult to do the translation yourself — it’ll probably be a lot cheaper/safer with one of these tools.
One more time: do not email me saying these are just preprocessors — they are not — they are compilers.