Improvements to SymbolSort tool for C++ code bloat analysis

It is known that code bloat significantly increases build times in C++. The question is: how can we quantify code bloat? How can we see things which are compiled too many times or the things which generate too much object code? How can we know which places would benefit most from improvement efforts?

This article is about an excellent tool called SymbolSort, which produces useful code bloat statistics in a project of any size. In particular, the article describes a few improvements which I have implemented recently to provide more useful analysis.

Manual

The simplest approach to see what exactly is compiled is to enable generation of assembly listings (/FA or /FAs on MSVC) and look through them from time to time. In case of MSVC, all generated symbols are listed at the top of the .asm file as PUBLIC symbols, e.g.:

; Listing generated by Microsoft (R) Optimizing Compiler Version 18.00.40629.0

include listing.inc

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC  hypot
PUBLIC  main
PUBLIC  ??$calcSum@H@@YAHPEBHH@Z                                            ; calcSum<int>
PUBLIC  ?run@?$Bar@H$06@@QEBAHXZ                                            ; Bar<int,7>::run
PUBLIC  ??$accumulate@PEBHH@std@@YAHPEBH0H@Z                                ; std::accumulate<int const * __ptr64,int>
PUBLIC  ??$accumulate@PEBHHU?$plus@X@std@@@std@@YAHPEBH0HU?$plus@X@0@@Z     ; std::accumulate<int const * __ptr64,int,std::plus<void> >
PUBLIC  ??$_Unchecked@PEBH@std@@YAPEBHPEBH@Z                                ; std::_Unchecked<int const * __ptr64>
PUBLIC  ??$_Accumulate@PEBHHU?$plus@X@std@@@std@@YAHPEBH0HU?$plus@X@0@@Z    ; std::_Accumulate<int const * __ptr64,int,std::plus<void> >
PUBLIC  ??$?RAEAHAEBH@?$plus@X@std@@QEBA?BHAEAHAEBH@Z                       ; std::plus<void>::operator()<int & __ptr64,int const & __ptr64>
PUBLIC  ??_C@_03PMGGPEJJ@?$CFd?6?$AA@                                       ; `string'
PUBLIC  __xmm@00000004000000030000000200000001
EXTRN   printf:PROC
EXTRN   _hypot:PROC
EXTRN   __GSHandlerCheck:PROC
EXTRN   __security_check_cookie:PROC
EXTRN   __security_cookie:QWORD
EXTRN   _fltused:DWORD
...

It is especially insightful to take a small .cpp file which has suspiciously large .obj file and look into the assembly generated. In the simplest case, it is enough to look through function symbols, although there is a lot of other stuff: constants of various types and exception handlers. Also note that EXTRN symbols are not generated in this file.

Anyway, this approach does not provide statistics over the whole project. It can help to gain understanding of what happens, but it requires a good guess to actually notice a problem. And without changing the code there is no way to see how big the problem is.

SymbolSort

SymbolSort is a tool created by Adrian Stone, originally announced in his blog and later released for public on GitHub. It is recommended to read the original blog article for detailed description of its options and results. In general, the tool parses a set of input files, extracts information about all symbols contained in them, and then prints out various types of statistics.

SymbolSort supports two major types of input files:

  1. A dump of object file (or files), produced either by dumpbin.exe (Visual C++) or by nm (GNU binutils). This type of input is most useful for analyzing build times, since it counts symbols before linker deduplicates them. In the current article, only MSVC platform will be considered.

  2. A module (either executable or dynamic library) or a .pdb file with debug symbols. As far as I understand, analyzing an .exe file is equivalent to analyzing its .pdb file. Since this is the final binary with all the symbols deduplicated, this type of analysis is most useful for reducing executable size and improving code cache performance, but not for improving build performance.

To demonstrate SymbolSort capabilities, I'll apply it to some real-world C++ project. I'll try to analyze The Dark Mod (TDM), which is an open source standalone game based on Doom 3 engine. It is very important to note that the Doom 3 engine originally does not use STL at all: it has its own library called "idLib" with greatly simplified custom alternatives to STL stuff. However, the dark mod coders used STL in many places over their code, so we will definitely see it in the analysis too.

Demo: exe

First let's run SymbolSort on the final TDM executable:

SymbolSort -in TheDarkModx64.exe -out TheDarkModx64.exe.report

After a few seconds, we can see the result in TheDarkModx64.exe.report (the full file is much longer):

Raw Symbols
Total Count  : 64112
Total Size   : 60215952
Unattributed : 2750964
--------------------------------------
Sorted by Size
        Size Section/Type  Name                     Source
    16777216          bss  optEdges                 c:\thedarkmod\darkmod_src\tools\compilers\dmap\optimize.cpp
    7834688          bss  sessLocal                c:\thedarkmod\darkmod_src\framework\session.cpp
    6887456          bss  gameLocal                c:\thedarkmod\darkmod_src\game\game_local.cpp
    6291456          bss  optVerts                 c:\thedarkmod\darkmod_src\tools\compilers\dmap\optimize.cpp
    3407872          bss  outputTris               c:\thedarkmod\darkmod_src\tools\compilers\dmap\shadowopt3.cpp
    2359296          bss  silQuads                 c:\thedarkmod\darkmod_src\tools\compilers\dmap\shadowopt3.cpp
    1572864          bss  shadowVerts              c:\thedarkmod\darkmod_src\renderer\tr_stencilshadow.cpp
    1572864          bss  silEdges                 c:\thedarkmod\darkmod_src\tools\compilers\dmap\shadowopt3.cpp
    786432          bss  rb_debugLines            c:\thedarkmod\darkmod_src\renderer\tr_rendertools.cpp
    589824          bss  EventPool                c:\thedarkmod\darkmod_src\game\gamesys\event.cpp
    576360         data  localConsole             c:\thedarkmod\darkmod_src\framework\console.cpp
    524288          bss  udpPorts                 c:\thedarkmod\darkmod_src\sys\win32\win_net.cpp
    458752          bss  rb_debugPolygons         c:\thedarkmod\darkmod_src\renderer\tr_rendertools.cpp
    393216          bss  shadowIndexes            c:\thedarkmod\darkmod_src\renderer\tr_stencilshadow.cpp
    ...             ...  ...                      ...

Note that the file TheDarkModx64.exe is only 10 MB, and here it says that there are 60 MB of symbols. Moreover, the largest symbols are attributed to .bss section, meaning that they are global or static variables. It seems that IdTech4 loves global variables too much. In order to workaround this problem, we can use the newly added -sections parameter to filter only code symbols:

SymbolSort -in TheDarkModx64.exe -out TheDarkModx64.exe.report -sections code

This gives us much more useful statistics, which includes only the machine code. The SymbolSort produces many different lists of symbols (or groups of symbols), sorted by total count, size or name. You can download this complete report and look into it yourself. Here I will pay close attention only to two lists.

The first is the list of file contributions, which shows how much code or symbols are generated for each source code file, also with information about directories:

File Contributions
--------------------------------------
Sorted by Size
        Size   Count  Source Path
    6028867   33730  c:
    5869935   31632  c:\thedarkmod\darkmod_src
    2957843   16774  c:\thedarkmod\darkmod_src\game
    1194434    7034  c:\thedarkmod\darkmod_src\tools
    631682    3243  c:\thedarkmod\darkmod_src\game\ai
    581988    3156  c:\thedarkmod\darkmod_src\tools\radiant
    541960    2149  c:\thedarkmod\darkmod_src\idlib
    509794    1878  c:\thedarkmod\darkmod_src\renderer
    338432    1156  c:\thedarkmod\darkmod_src\game\physics
    280497    1768  c:\thedarkmod\darkmod_src\framework
    249891     458  c:\thedarkmod\darkmod_src\idlib\math
    230854     665  c:\thedarkmod\darkmod_src\tools\compilers
    172726     499  c:\thedarkmod\darkmod_src\game\ai\ai.cpp
    168365    1477  c:\thedarkmod\darkmod_src\game\gamesys
    163355     814  c:\thedarkmod\darkmod_src\game\ai\states
    158932    2098  c:\program files (x86)\microsoft visual studio 12.0\vc
    156801    2002  c:\program files (x86)\microsoft visual studio 12.0\vc\include
    153308    1057  c:\thedarkmod\darkmod_src\game\entity.cpp
    138646     933  c:\thedarkmod\darkmod_src\ui
    ...        ...  ...
    76322     506  c:\program files (x86)\microsoft visual studio 12.0\vc\include\xtree
    ...        ...  ...

As you see, it is easy to learn how much code space is taken by each game subsystem. The whole code is 6 MB. The game logic itself (darkmod_src\game) occupies half of code space (3 MB), while various tools (darkmod_src\tools) take only about 20% (1.2 MB). Also, this view makes it most convenient to see how much code is generated for MSVC standard library: only 160 KB is taken by noninlined functions of standard library, with almost half of it (75KB) generated by xtree header. This header contains implementation of std::set and std::map, which often constitutes most of the machine code from STL.

Another useful statistics shows how much space is taken by all instances of a template function or method. Here it is (sorted by size):

Merged Template Symbols
Merged Count  : 26353
--------------------------------------
Sorted by Total Count
    ...

Sorted by Total Size
  Total Size  Total Count  Name
      5027         2160  `dynamic initializer for '...''
      2032         2030  `dynamic atexit destructor for '...''
      6417          220  public: void __cdecl idList<T>::Resize(int) __ptr64
      5410           67  ai::`dynamic initializer for '...''
      2087           17  protected: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::_Insert_at<T>(bool,struct std::_Tree_node<T> * __ptr64,struct std::pair<T> && __ptr64,struct std::_Nil) __ptr64
      1261           26  public: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::erase(class std::_Tree_const_iterator<T>) __ptr64
      8678           16  public: class A0xa82155d7::xpath_ast_node * __ptr64 __cdecl `anonymous namespace'
      8577           13  protected: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::_Insert_hint<T>(class std::_Tree_const_iterator<T>,struct std::pair<T> & __ptr64,struct std::_Tree_node<T> * __ptr64) __ptr64
      7112           18  protected: struct std::pair<T> __cdecl std::_Tree<T>::_Insert_nohint<T>(bool,struct std::pair<T> && __ptr64,struct std::_Nil) __ptr64
      6563           19  void __cdecl `anonymous namespace'
      6286           12  private: class A0xa82155d7::xpath_node_set_raw __cdecl `anonymous namespace'
      5662          140  `std::shared_ptr<T>::_Resetp<T>'...'::catch$0
      5504           13  protected: struct std::pair<T> __cdecl std::_Tree<T>::_Insert_nohint<T>(bool,struct std::pair<T> & __ptr64,struct std::_Tree_node<T> * __ptr64) __ptr64
      5338            4  private: static bool __cdecl `anonymous namespace'
      4883           13  protected: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::_Insert_at<T>(bool,struct std::_Tree_node<T> * __ptr64,struct std::pair<T> & __ptr64,struct std::_Tree_node<T> * __ptr64) __ptr64
      ...           ...  ...

Let's leave the problem of dynamic initializers and destructors --- those are most likely initialization code for global and static variables, and idTech4 has a lot of them. The first highlighted line shows that idList<T>::Resize method has 220 instances of total size 6.4 KB. The other highlighted lines show various methods of std::_Tree template (which is the implementation of std::set and std::map). Information like this is especially useful for generic template classes, which are used a lot in the project.

One question which easily comes to mind is: "how much code is generated for the whole std::_Tree class"? The stock version of SymbolSort cannot answer this question.

Demo: obj

As mentioned before, the analysis of executable may be useful for optimizing instruction caching, but it does not represent well where the build time is spent. Every function defined in header is compiled independently as many times as there are source files using it (read more here), with duplicates being merged by linker later. Writing code in headers may be necessary for inlining, and also for templates. And even though idTech4 was written long time ago by people mainly proficient with C, it still contains some templates.

To better estimate build time, we should analyze .obj files instead of executable. Every .obj file represents all the code generated by compiler for the corresponding translation unit, so if we sum up the stats across all .obj files, we will know how much code was compiled (including duplicates). And it is probably better to do so for debug build with inlining disabled, because even though inlined functions can be omitted in .obj files, they still take compiler's time.

In order to run SymbolSort on .obj files, we have to 1) run dumpbin.exe /headers on every .obj file and 2) list all the resulting symbol files as arguments to SymbolSort. It is also allowed concatenate all symbol files into one file all_obj.smb. Here is an example:

SymbolSort -in:comdat .\DarkModTools\all_obj.smb -in:comdat .\idLib\all_obj.smb -out object_files.report

To simplify invoking SymbolSort, I have created bloatinfo.py script. In order to get single report over several projects, just run the following command in the directory containing their .obj files:

bloatinfo --obj=. --dump --analyze

The script calls dumpbin internally, merges symbol files for each project and runs SymbolSort on them. The results are put into object_files.report in the current directory (you can download the full report):

Raw Symbols
Total Count  : 126533
Total Size   : 21001307
Unattributed : 0
--------------------------------------
Sorted by Size
        Size Section/Type  Name                                                                                                                      Source
    106356     .text$mn  public: bool __cdecl idClass::ProcessEventArgPtr(class idEventDef const *,__int64 *)                                      .\DarkModTools\Class.obj
    65536         .bss  char (* `char * __cdecl va(char const *,...)'::`2'::string)[16384]                                                        .\idLib\Str.obj
    65536         .bss  char (* `public: static char const * __cdecl idStr::FloatArrayToString(float const *,int,int)'::`2'::str)[16384]          .\idLib\Str.obj
    65536         .bss  char (* `public: static char const * __cdecl idTypeInfoTools::OutputString(char const *)'::`2'::buffers)[16384]           .\DarkModTools\TypeInfo.obj
    65536         .bss  class idClipModel * * `int __cdecl GetObstacles(class idPhysics const *,class idAAS const *,class idEntity const *,int,class idVec3 const &,class idVec3 const &,struct obstacle_s *,int,class idBounds &,struct obstaclePath_s &)'::`5'::clipModelList  .\DarkModTools\AI_pathing.obj
    65536         .bss  class idEntity * * `public: float __cdecl idPush::ClipRotationalPush(struct trace_s &,class idEntity *,int,class idMat3 const &,class idRotation const &)'::`2'::entityList  .\DarkModTools\Push.obj
    65536         .bss  class idEntity * * `public: void __cdecl idPlayer::PerformFrobCheck(void)'::`42'::frobRangeEnts                           .\DarkModTools\Player.obj
    64768         .bss  struct polyhedron * `struct polyhedron __cdecl make_sv(struct polyhedron const &,class idVec4)'::`2'::lut                 .\DarkModTools\tr_shadowbounds.obj
    62651     .text$mn  public: bool __cdecl idMat6::InverseSelf(void)                                                                            .\idLib\Matrix.obj
    ...            ...  ...

As usual, there are many .bss symbols here. The total size by sections is:

Merged Sections / Types
Merged Count  : 10
--------------------------------------
Sorted by Total Count
Total Size  Total Count  Name
    15530261        84799  .text$mn
     2288278        15786  .rdata
      780809         9632  .text$x
      211188         7545  .rdata$r
      321431         2609  .text$di
      104892         2289  .data$r
       86136         2206  .text$yd
     1629712         1258  .bss
       48392          383  .data
         208           26  .CRT$XCU

As you see, there are only 1.5 MB of such symbols, not 50 MB as we saw in the executable. I have no idea why the other global variables are not shown here.

Anyway, we have 15.5 MB of machine code compiled in total over all object files, contrasted with 11.5 MB of code in TheDarkModx64.exe when built in Debug configuration. This gives us nice 4/3 ratio of overcompilation, which is very good. On my initial run I even got a ratio less than one, which happened because function-level linking was not enabled, so some symbols were missing in dumpbin outputs. A rule of thumb is: enable function-level linking for .obj analysis, otherwise non-template non-inline stuff will be missing in the report!

I think such low overcompilation ratio was only possible due to the "C with classes" mindset. Modern C++ programming with templates abuse is much worse in this regard. For instance, one project of OpenCollada has 9/2 ratio of overcompilation. It means that each piece of code has to be independently compiled 4.5 times on average. Actually, most of this bloat in object files comes from a few templates, which have crazy ratio of overcompilation (near hundred).

But let's continue with TheDarkMod. The "file contributions" section in the object_files.report is not revealing:

File Contributions
--------------------------------------
Sorted by Size
        Size   Count  Source Path
    20976534  126384  .
    19193802  120624  .\DarkModTools
     1782732    5760  .\idLib
      477915    2320  .\DarkModTools\AI.obj
      439014    1926  .\DarkModTools\Player.obj
      432376    2643  .\DarkModTools\Entity.obj
      333805    2241  .\DarkModTools\Game_local.obj
      318057     399  .\idLib\Matrix.obj
      254341    1203  .\DarkModTools\pugixml.obj
      238372    1041  .\DarkModTools\Physics_AF.obj
      189301     510  .\DarkModTools\EditorBrush.obj
      184638     886  .\DarkModTools\MainFrm.obj
      174207    1138  .\DarkModTools\Actor.obj
      163010     641  .\DarkModTools\Window.obj
      ...        ...  ...

Here machine code is classified into object files. If some class like idList generates much bloat because its methods are defined in header, we won't see it here. Because every .obj file will have at most one duplicate of each of these methods. It turns out that the most nasty code bloaters are evenly spread across .obj files, making this "file contributions" section useless.

A better view is provided by "merged template symbols" section:

Merged Template Symbols
Merged Count  : 39841
--------------------------------------

Sorted by Total Size
  Total Size  Total Count  Name
      865280          104  protected: static struct ATL::CTrace::CategoryMap * ATL::CTrace::m_nMap
      482057         6611  `string'
      297459         2299  void __cdecl `dynamic initializer for '...''
      243853          107  public: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::erase(class std::_Tree_const_iterator<T>)
      207875          523  public: void __cdecl idList<T>::Resize(int)
      139609           83  protected: struct std::pair<T> __cdecl std::_Tree<T>::_Insert_nohint<T>(bool,struct std::pair<T> &&,struct std::_Nil)
      124320           21  protected: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::_Insert_hint<T>(class std::_Tree_const_iterator<T>,struct std::pair<T> &,struct std::_Tree_node<T> *)
       91184           82  protected: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::_Insert_at<T>(bool,struct std::_Tree_node<T> *,struct std::pair<T> &&,struct std::_Nil)
       80271          679  public: void __cdecl idList<T>::Clear(void)
       79856         2045  void __cdecl `dynamic atexit destructor for '...''
       70246          103  public: class std::_Tree_iterator<T> __cdecl std::_Tree<T>::erase(class std::_Tree_const_iterator<T>,class std::_Tree_const_iterator<T>)
       59325          525  public: __cdecl idList<T>::idList<T>(int)
       50102           94  public: void __cdecl std::basic_string<T>::_Copy(unsigned __int64,unsigned __int64)
       43036          106  public: class std::_Tree_const_iterator<T> & __cdecl std::_Tree_const_iterator<T>::operator--(void)
       42840          126  protected: void __cdecl std::_Tree<T>::_Lrotate(struct std::_Tree_node<T> *)
       42840          126  protected: void __cdecl std::_Tree<T>::_Rrotate(struct std::_Tree_node<T> *)
       42284          341  protected: void __cdecl idStr::Init(void)
       41460           28  private: void __cdecl `anonymous namespace'
       37736          106  public: struct std::_Tree_node<T> * __cdecl std::_Tree_buy<T>::_Buynode0(void)
       35658          126  public: bool __cdecl std::_Tree_const_iterator<T>::operator==(class std::_Tree_const_iterator<T> const &)const 
       35406           42  public: bool __cdecl idBounds::AddPoint(class idVec3 const &)
       ...            ...  ...

Note that the very first symbol ATL::CTrace::m_nMap is a .bss symbol, so it does not take compiler's time. The second symbol `string' corresponds to all string constants compiled, so it is not code also. Among the rest, std::_Tree pieces are consistently in the top =) Some methods of idList<T> are also noticeable. It would be great to know how much code is generated for idList class in total, but the original SymbolSort does not provide such insight.

Improvements

As you see above, the original version of SymbolSort lacks one important feature on .obj input: you cannot learn how much code a template class is generating. This could be especially useful for widely used generic template classes, in particular for templates like idList and std::set. There are two approaches to provide this functionality:

  • PDB. Attribute each symbol to the source or header file where its implementation is physically located, instead of attributing it to the .obj file where it is compiled. As the result, the stats for the header file (where template is defined) in the "file contributions" section should give the wanted information.

  • classpath. Extract classpath from the name of each symbol, which shows the class name with all namespaces containing it, e.g. boost::container::deque. Then show amount of code with each classpath. In addition, statistics per namespaces (e.g. boost) can be shown just like statistics per directories are shows in "file contributions" section.

Both approaches are implemented in the updated version of SymbolSort (currently available in my fork). The classpath approach works without any additional input, as long as undname.exe tool is in PATH. The PDB approach (as the name suggests) requires .pdb files to be provided using -info parameters. For the bloatinfo.py script, it is enough to simply provide the root directory containing the .pdb files:

bloatinfo --obj=. --pdb=exeDir --dump --analyze

The script passes all the .pdb files present in this directory to SymbolSort. Here is the full list of command line arguments passed to SymbolSort in my case (seen in options.txt file):

-out object_files.report
-in:comdat .\DarkModTools\all_obj.smb -in:comdat .\idLib\all_obj.smb
-info exeDir\TheDarkModx64.pdb              # determine code location (source file) using this pdb

As usual, you can download the full report. It differs from the previous one in only two sections: "file contributions" and newly added "namespaces and classes contributions". Here you can see the first one, which shows results of the PDB approach:

File Contributions
--------------------------------------
Sorted by Size
        Size   Count  Source Path
    16996266   99982  c:
    13219968   65465  c:\thedarkmod\darkmod_src
     4986400   24553  c:\thedarkmod\darkmod_src\game
     4385995   22000  c:\thedarkmod\darkmod_src\idlib
     3776298   34517  c:\program files (x86)\microsoft visual studio 12.0\vc
     3748270   23115  [not_in_pdb]
     3646356   33365  c:\program files (x86)\microsoft visual studio 12.0\vc\include
     1965992    8770  c:\thedarkmod\darkmod_src\tools
     1594575    6939  c:\program files (x86)\microsoft visual studio 12.0\vc\include\xtree
     1348758    7184  c:\thedarkmod\darkmod_src\idlib\math
     1255015    7762  c:\thedarkmod\darkmod_src\game\ai
      952373    3052  c:\thedarkmod\darkmod_src\tools\radiant
      866746     424  c:\thedarkmod\darkmod_src\idlib\precompiled.cpp
      837893    6008  c:\thedarkmod\darkmod_src\idlib\containers
      716211    5007  c:\thedarkmod\darkmod_src\idlib\containers\list.h
      682448    1730  c:\thedarkmod\darkmod_src\renderer
      522322    3779  c:\thedarkmod\darkmod_src\framework
      456809    1828  c:\thedarkmod\darkmod_src\game\physics
      446605    7671  c:\program files (x86)\microsoft visual studio 12.0\vc\include\xmemory0
      392003    2832  c:\program files (x86)\microsoft visual studio 12.0\vc\include\xstring
      371915    4260  c:\thedarkmod\darkmod_src\idlib\str.h
      340839    5005  c:\program files (x86)\microsoft visual studio 12.0\vc\include\memory
      337826    3349  c:\thedarkmod\darkmod_src\idlib\math\vector.h
      323912     659  c:\thedarkmod\darkmod_src\idlib\geometry
      322392    1728  c:\thedarkmod\darkmod_src\game\gamesys
      315030    1384  c:\thedarkmod\darkmod_src\game\ai\ai.cpp
      296359     693  c:\thedarkmod\darkmod_src\tools\compilers
      288751     809  c:\thedarkmod\darkmod_src\idlib\bv
      280758     129  c:\thedarkmod\darkmod_src\idlib\math\matrix.cpp
      280446    1479  c:\thedarkmod\darkmod_src\game\ai\states
      262804    1280  c:\thedarkmod\darkmod_src\ui
      256771    3436  [unclear_source]
        ...      ...  ...

We can learn a lot of things from this list:

  1. The whole C/C++ standard library generates 3.7 MB of object code.
  2. Implementation of std::set / std::map (xtree) generates at least 1.5 MB of code. 800 KB is generated by STL memory management, perhaps also for set/map. Despite being rarely used, std::string seems to gen 400 KB --- a bit more than widely used idStr from str.h.
  3. idList from list.h generates only 700 KB of code --- that's a very little. idStr generates 370 KB, and all vector math generates 330 KB.
  4. Among different components: game logic (game) generates 5 MB, the ID's generic library (idlib) generates 4.4 MB, in-game tools (tools) generates 2 MB, framework (framework) generates 500 KB, and the renderer core (renderer --- written in C) generates 700 KB. Probably the results would be more precise if we remove bss and data sections from here.

A minor nuisance of this report is that we see two special filenames: [not_in_pdb] and [unclear_source]. These categories contain all symbols for which SymbolSort failed to find proper location. A symbol gets into [unclear_source] when it has some code and is present in PDB, but it does not have any source code location. Mainly, these are all the implicitly generated class methods: automatically generated constructors, destructors and assignment operators. Here are some examples:

public: virtual void * __cdecl idAASLocal::`scalar deleting destructor'(unsigned int)
public: void * __cdecl eas::tdmEAS::`scalar deleting destructor'(unsigned int)
public: void * __cdecl idList<struct SBoolParseNode>::`vector deleting destructor'(unsigned int)
public: class idAASSettings & __cdecl idAASSettings::operator=(class idAASSettings const &)
public: __cdecl idDrawVert::idDrawVert(void)
public: class idSurface & __cdecl idSurface::operator=(class idSurface const &)
void __cdecl `vector constructor iterator'(void *,unsigned __int64,int,void * (__cdecl*)(void *))
void __cdecl `public: static class Library<class ai::State>::Instance & __cdecl ai::Library<class ai::State>::Instance(void)'::`2'::`dynamic atexit destructor for '_instance''(void)
public: __cdecl std::_Tree_buy<struct std::pair<int const ,class CFrobDoor *>,class std::allocator<struct std::pair<int const ,class CFrobDoor *> > >::~_Tree_buy<struct std::pair<int const ,class CFrobDoor *>,class std::allocator<struct std::pair<int const ,class CFrobDoor *> > >(void)
public: __cdecl std::_Iterator012<struct std::bidirectional_iterator_tag,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> >,__int64,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> > const *,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> > const &,struct std::_Iterator_base12>::_Iterator012<struct std::bidirectional_iterator_tag,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> >,__int64,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> > const *,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> > const &,struct std::_Iterator_base12>(struct std::_Iterator012<struct std::bidirectional_iterator_tag,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> >,__int64,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> > const *,struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::function<class std::shared_ptr<class ai::Task> __cdecl(void)> > const &,struct std::_Iterator_base12> const &)

The larger [not_in_pdb] category is comprised of symbols which are not present in PDB. They are: initialization/finalization code for global and static variables, static functions, exception handling, runtime check cookies. Here are some examples:

void __cdecl `dynamic initializer for 'engineVersion''(void)
void __cdecl std::`dynamic initializer for 'piecewise_construct''(void)
void __cdecl `dynamic initializer for 'public: static class idTypeInfo CAbsenceMarker::Type''(void)
void __cdecl `dynamic atexit destructor for 'public: static class idTypeInfo CAbsenceMarker::Type''(void)
int `public: static class Alloc * __cdecl idAAS::Alloc(void)'::`1'::dtor$0
int `public: virtual void __cdecl idAASLocal::DeReferenceDoor(class CFrobDoor *,int)'::`1'::dtor$0
int `public: class dtor$0 & __cdecl idList<class idDrawVert>::operator=(class dtor$0 const &)'::`1'::dtor$0
int `public: void __cdecl idAASLocal::ShowArea(class idVec3 const &)const '::`2'::lastAreaNum
class idPlane `public: virtual class idPlane const & __cdecl idAASLocal::GetPlane(int)const '::`5'::dummy
unsigned int `public: virtual class idPlane const & __cdecl idAASLocal::GetPlane(int)const '::`5'::$S5
void __cdecl Com_EditSounds_f(class idCmdArgs const &)
int `public: static class idClass * __cdecl CAbsenceMarker::CreateInstance(void)'::`1'::catch$0
public: static void (__cdecl* std::_Error_objects<int>::_Iostream_object$initializer$)(void)
S12<`template-parameter-2',idAFEntity_VehicleSimple::wn,unsigned int, ?? &>
int `public: void __cdecl std::allocator<struct std::_Tree_node<struct std::pair<enum ai::EAlertState const ,class idStr>,void *> >::construct<struct std::_Tree_node<struct std::pair<enum ai::EAlertState const ,class idStr>,void *> *,struct std::_Tree_node<struct std::pair<enum ai::EAlertState const ,class idStr>,void *> * &>(struct allocator<struct std::_Tree_node<struct std::pair<enum ai::EAlertState const ,class idStr>,void *> >::_Tree_node<struct std::pair<enum ai::EAlertState const ,class idStr>,void *> * *,struct allocator<struct std::_Tree_node<struct std::pair<enum ai::EAlertState const ,class idStr>,void *> >::std * &)'::`1'::dtor$0
?ToString@idBox@@QEBAPEBDH@Z$rtcName$0

It is indeed very enlightening to look at all this crazy stuff and try to understand what it is about =) Still have no idea what the hell the highlighted line is. It looks like even the undecorating code did not work properly for that symbol.


And now let's look at the new section called "Namespaces and classes contributions", which is the outcome of the "classpath" approach discussed above. Note that the individual methods are not shown here, because the last component of classpath (which is usually method name) is stripped off. The leading .:: is added for implementation simplicity.

Namespaces and classes Contributions
--------------------------------------
Sorted by Size
        Size   Count  Source Path
    20763966  125021  .
    4482084   45792  .::std
    1210159    3772  .::std::_Tree<T>
    1037109    2296  .::ATL
    954888     738  .::ATL::CTrace
    731544    5265  .::idList<T>
    654808    3972  .::ai
    417844    4085  .::idStr
    375015     829  .::idAI
    349640    2411  .::std::basic_string<T>
    289220    2598  .::idVec3
    281714     502  .::idPlayer
    237341    1512  [unknown]
    203972    1032  .::idEntity
    190771    1284  .::std::_Tree_const_iterator<T>
    189928     738  .::idBounds
    175113    3761  .::std::allocator<T>
    171348     232  .::idCollisionModelManagerLocal
    166539    1258  .::std::_Tree_alloc<T>
    165286    2140  .::std::shared_ptr<T>
    153105     293  .::idGameLocal
    152834    1480  .::std::_Iterator_base12
    152768    2514  .::std::_Wrap_alloc<T>
    150042     757  .::idMat3
    145294      62  .::idRenderMatrix
    131388     283  .::idMatX
    128163     453  .::idClass
    126683    2332  .::std::_Ref_count<T>
    122005    1541  .::std::_Func_impl<T>
    121407     294  .::idWindow
    112618     396  .::ai::State
    109591     746  .::std::_Tree_buy<T>
    105087    1042  .::idDict
    102964     827  .::std::_Func_class<T>
       ...     ...  ...

Here are some notes about these results:

  1. With all the non-PDB symbols, STL generates 4.5 MB of code. Out of which at least 1.6 MB are done by highlighted classes, which comprise set/map implementation.
  2. It looks like ATL::CTrace generates 1 MB of code, but judging from the other parts of the report, most of this size consists of .bss symbols, i.e. not code at all.
  3. We confirm that idList template class generates 730 KB of code, idStr generates 420 KB.
  4. Among vector math, idVec3 is obviously the most popular: it generates 290 KB of code. Classes idBounds and idMat3 are also rather widely used.
  5. The whole ai namespace (AI of TDM) generates 650 KB of object code.

As with PDB approach shown previously, here we also see some symbols with [unknown] classpath. It is hard to say exactly why are they unclassified, but given that classpath extraction is done (partly) with manual regex-based recognition, there is no surprise that some crazy cases are not working. The typical problematic cases are: anonymous namespaces and lambda functions. The other fails are complex examples, where single pattern is not enough to deduce classpath. Here are some examples:

class idStr __cdecl `anonymous namespace'::NormalisePath(class idStr const &)
void __cdecl VertexMinMax<class <lambda_e15de9cdacbee52611b15ce13c2dbb01> >(class idVec3 &,class idVec3 &,class idDrawVert const *,int,class <lambda_e15de9cdacbee52611b15ce13c2dbb01>)
bool __cdecl `anonymous namespace'::convert_buffer_utf16<struct `anonymous namespace'::opt_false>(char * &,unsigned __int64 &,void const *,unsigned __int64,struct A0xa82155d7::opt_false)
?parse@xml_parser@?A0xa82155d7@@SA?AUxml_parse_result@pugi@@PEAD_KPEAUxml_node_struct@4@I@Z$rtcName$0
int `class catch$0::basic_ostream<char,struct std::char_traits<char> > & __cdecl std::operator<<<struct std::char_traits<char> >(class catch$0::std &,char const *)'::`1'::catch$0
int `void __cdecl `dynamic initializer for 'sLootTypeName''(void)'::`1'::dtor$0
void __cdecl `dynamic initializer for 'public: static class idCVar <unnamed-tag>::in_mouse''(void)
class <lambda_e261b9d4d68e91887cf921d793e3e07c> `RTTI Type Descriptor'
[thunk]: __cdecl CBinaryFrobMover::`vcall'{1032,{flat}}' }'

As you see, both approaches are not perfect. Both of them sometimes misinterpret or fail to interpret some symbols. But note that it is very hard to even define exactly how to determine source file and owning class for every symbol. The whole analysis is approximate anyway.

Implementation details

PDB approach

It is impossible to detect source code locations of functions using .obj files only, but this information is available in .pdb file. That's why .pdb files must be specified in order to attribute symbols from .obj files to the headers where they are defined. The new parameter -info is introduced to specify such .pdb files: these files are read as usual, but their contents are not counted in the reported statistics. The only way how the contents of these .pdb files influence the report is that the source filenames of all symbols are deduced from them.

If no parameters are specified with -info key, then the analysis is performed the old way, i.e. each symbol is attributed to the .obj file where it is compiled. If at least one -info parameter is specified, then the filename replacing is performed. Aside from that, the analysis continues as usual.

Every pdb file specified as -info is read, and all the symbols are put into special storage. These symbols do not get into the main storage, where the symbols read from .obj files are located. Now we establish correspondence between the symbols of the two storages. For each symbol from the main storage, we find a symbol in special pdb storage with same name, and take its filename to replace the original one.

Console.WriteLine("Connecting symbols to PDB info...");
int connectedCnt = 0, allCnt = symbols.Count;
foreach (Symbol s in symbols)       //"symbols"  --- main storage
{                                   //"infoDict" --- special pdb storage
    Symbol info;
    if (infoDict.TryGetValue(s.raw_name, out info))
    {
        connectedCnt++;
        s.source_filename = info.source_filename;
    }
    else
        s.source_filename = "[not_in_pdb]";
}
Console.WriteLine("Connected {0}% symbols ({1}/{2})", (uint)(100.0 * connectedCnt / allCnt), connectedCnt, allCnt);

The source filename replacement happens in the highlighted line. Here is an example for one symbol:

s.name = "public: __cdecl idList<class idVec4>::~idList<class idVec4>(void)"
s.raw_name = "??1?$idList@VidVec4@@@@QEAA@XZ"
s.source_filename = ".\DarkModTools\AAS.obj"                                // the original source filename (replaced)
info.source_filename = "c:\thedarkmod\darkmod_src\idlib\containers\list.h"  // the source filename from PDB (taken)

Correspondence between symbols is established based on their raw names, i.e. the decorated/mangled names. The original version of SymbolSort did not read these names neither from comdat nor from PDB inputs, so the code must have been extended for it. Let's start with comdat output of .obj file. Here is how it looks like:

SECTION HEADER #30
.text$mn name
       0 physical address
       0 virtual address
      2E size of raw data
    87CC file pointer to raw data (000087CC to 000087F9)
    87FA file pointer to relocation table
       0 file pointer to line numbers
       1 number of relocations
       0 number of line numbers
60501020 flags
         Code
         COMDAT; sym= "public: __cdecl idList<class idVec4>::~idList<class idVec4>(void)" (??1?$idList@VidVec4@@@@QEAA@XZ)
         16 byte align
         Execute Read

The undecorated name is extracted from the highlighted line using simple regex, and it is trivial to extend the regex to extract the raw name as well.

For the PDB file, extracting decorated name is a bit harder. It turns out that there is a major distinction between public and private symbols in PDB. In particular, all public symbols have raw symbol name stored in IDiaSymbol.name, while private symbols have human-readable name stored in it. Unfortunately, I did not know about this distinction, and SymbolSort extracts only private symbols by default, so I spent a lot of time trying to get raw name out of private IDiaSymbol-s. I even found one way which works for most private symbols for no apparent reason (and fails on some of them, also for no apparent reason). Later I switched to using public symbols.

symbol.short_name = diaSymbol.name == null ? "" : diaSymbol.name;
symbol.name = diaSymbol.undecoratedName == null ? symbol.short_name : diaSymbol.undecoratedName;
symbol.flags = additionalFlags;

if (type == SymTagEnum.SymTagPublicSymbol)
{
    symbol.raw_name = symbol.short_name;
}
else
{
    //there is no reason this can work, but it often works...
    string rawName;
    IDiaSymbolUndecoratedNameExFlags flags =
        IDiaSymbolUndecoratedNameExFlags.UNDNAME_32_BIT_DECODE |
        IDiaSymbolUndecoratedNameExFlags.UNDNAME_TYPE_ONLY;
    diaSymbol.get_undecoratedNameEx((uint)flags, out rawName);
    if (rawName != null)
    {
        //ignore trashy names like " ?? :: ?? ::Z::_NPEBDI_N * __ptr64 volatile "
        if (!rawName.Contains(' '))
            symbol.raw_name = rawName;
    }
}

The first highlighted line simply takes IDiaSymbol.name as raw name, provided that symbol is public. Also I enabled a flag so that public symbols are always extracted and they take precedence, so probably the case of private symbols is not necessary. For a private symbol, IDiaSymbol.name contains partly undecorated name, but calling get_undecoratedNameEx method with weird flag often returns correct result, e.g.:

diaSymbol.name = "idList<idVec4>::~idList<idVec4>"
diaSymbol.get_undecoratedNameEx(UNDNAME_TYPE_ONLY) = "??1?$idList@VidVec4@@@@QEAA@XZ"

The last problem which I faced with PDB approach was how source filenames were determined for symbols from pdb. Normally, SymbolSort asks DIA framework about the lines of code which correspond to symbol's address, and if it finds any, then it saves the file path of the first such line. So it seems that the filename of a pdb symbol is exactly the source file where debugger would stop if you put a breakpoint into the symbol. However, if a pdb symbol has no corresponding lines in code, then SymbolSort detects the filename by looking which translation unit provided the code at its address. This is exactly what we tried to avoid in the first place, ignoring the fact that with such mechanism all the duplicates of one symbol get attributed randomly to one of the many source files where it is used. That's why every symbol which has no lines of code in PDB is marked accordingly, so that its filename is later replaced with [unclear_source] to avoid confusion.

Classpath approach

In a basic case, every code symbol is a method of some class, which is probably contained in other classes or namespaces. This whole construction (e.g. boost::container::map::insert) can be called "classpath", which is similar to file path, but with namespaces and classes instead of directories and files. The idea is to determine classpath of each symbol from its name, then strip its last component (it is usually the method's name), and attribute the symbol to the remaining classpath. Note that we remove the method name for two reasons: 1) we want to obtain per-class statistics, per-method report can be seen elsewhere, and 2) there are too many methods to list them all in the report.

As simple as it may sound, extracting classpath is actually quite hard. Direct processing of fully undecorated symbol name is problematic because it has a lot of excessive info which can get in the way. While it is possible to remove template arguments by simply detecting outer-level angle brackets (which SymbolSort already does), simply splitting the symbol name by "::" won't work even in simple cases. Consider the following examples:

class std::_Ref_count_base * && __cdecl std::_Move<class std::_Ref_count_base * &>(class std::_Ref_count_base * &)
public: __cdecl std::_Wrap_alloc<class std::allocator<char> >::_Wrap_alloc<class std::allocator<char> >(class std::allocator<char> const &)

As you see, return values, function parameters and template arguments can all have internal "::" separators, which must not be used to split the classpath. Luckily, some of the routines for undecorating symbols support partial undecoration. Calling WinAPI function UnDecorateSymbolName with flag UNDNAME_NAME_ONLY results in undecorated name without return value and function parameters:

UnDecorateSymbolName("?Warning@idLib@@SAXPEBDZZ", 0):
  public: static void __cdecl idLib::Warning(char const * __ptr64,...)"
UnDecorateSymbolName("?Warning@idLib@@SAXPEBDZZ", UNDNAME_NAME_ONLY):
  idLib::Warning

The partially undecorated symbol names are much easier to use: in the considered example the result is already a ready-to-use classpath. However, not all the cases are as simple as this one. Here is a hacky regex-based code which handles most of the cases (with tricky symbol names seen in comments):

private static string allowedSpecials = @"<=>,\[\]()!~^&|+\-*\/%" + "$";
private static string reClassWord = @"[\w " + allowedSpecials + "]+";
private static string reClassPath = String.Format(@"({0}::)*{0}", reClassWord);
private static Regex regexClassPath = new Regex("^" + reClassPath + "$", RegexOptions.Compiled);
private static string reLocalEnd = @".*";  //@"(`.+'|[\w]+(\$0)?)";
private static Regex regexFuncLocalVar = new Regex(String.Format(@"^`({0})'::`[\d]+'::{1}$", reClassPath, reLocalEnd));

public static string[] Run(string short_name) {
    //(all string constaints)
    if (short_name == "`string'")
        return new string[] { short_name };

    // Array<SharedPtr<Curve>, Allocator<SharedPtr<Curve>>>::Buffer::capacity
    // std::_Error_objects<int>::_System_object$initializer$
    if (regexClassPath.IsMatch(short_name))
        return splitByColons(short_name);

    // std::bad_alloc `RTTI Type Descriptor'
    const string rttiDescr = " `RTTI Type Descriptor'";
    if (short_name.EndsWith(rttiDescr)) {
        string[] res = Run(short_name.Substring(0, short_name.Length - rttiDescr.Length));
        if (res == null) return null;
        return res.Concat(new string[] { rttiDescr.Substring(1) }).ToArray();
    }

    // `CustomHeap::~CustomHeap'::`1'::dtor$0
    // `std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Copy'::`1'::catch$0
    // `CustomHeap<ShapeImpl>::instance'::`2'::some_var
    // `HeapWrap < ShapeImpl >::Stub::get'::`7'::`local static guard'
    // `HeapWrap<ShapeImpl>::Stub::get'::`7'::`dynamic atexit destructor for 'g_myHeap''
    // `Mesh::projectPoints'::`13'::$S1
    // `GroupElement::getNumElements'::`2'::MyCounter::`vftable'
    if (regexFuncLocalVar.IsMatch(short_name))
        return Run(regexFuncLocalVar.Match(short_name).Groups[1].Value);

    // `dynamic initializer for 'BoundingBox::Invalid''
    // `dynamic initializer for 'std::_Error_objects<int>::_System_object''
    // std::`dynamic initializer for '_Tuple_alloc''
    // UniquePtr<Service>::`scalar deleting destructor'
    if (short_name.EndsWith("'")) {
        int backtickPos = short_name.IndexOf('`');
        if (backtickPos >= 0) {
            string prefix = short_name.Substring(0, backtickPos);
            string quoted = short_name.Substring(backtickPos + 1, short_name.Length - backtickPos - 2);
            if (quoted.Count(c => c == '\'') == 2) {
                int left = quoted.IndexOf('\'');
                int right = quoted.LastIndexOf('\'');
                quoted = quoted.Substring(left + 1, right - left - 1);
            }
            string[] quotedWords = Run(quoted);
            if (quotedWords == null)
                return null;
            string[] prefixWords = splitByColons(prefix);
            return prefixWords.Take(prefixWords.Length - 1).Concat(quotedWords).ToArray();
        }
    }

    return null;
}

Unfortunately, we are not yet over with classpath approach =) It turned out that UnDecorateSymbolName function is no longer maintained. This function belongs to Dbghelp.dll library, the latest version of which was released in February 2010. It was even before C++11 was out, so this function fails to work on symbols containing rvalue references. There is an updated version called _unDName, which is not exposed to public, and even newer version called __unDNameEx, also for internal usage only. The latter version is used in undname.exe program which is a tool distributed with Visual C++.

In the end I decided to use undname.exe directly by creating a temporary file and spawning it with CreateProcess. The tool allows to put many symbol names into file, in which case it undecorates them all, so there are no performance problems like creating process thousand times. Also, it is safe to assume that user has undname.exe in PATH, since he anyway has to run dumpbin.exe to decode .obj files, which is located in the same directory.

Afterword

I hope the two improvements will provide a better vision of how much bloat headers and templates produce. In my opinion, both approaches are useful, despite being hacky and not 100% precise. Right now they are available in my fork, but I will prepare pull requests in the nearest future.

SymbolSort has two more features which I never tried: analyzing nm dumps and producing diff reports. The latter feature may be very useful in continuous integration if you really care about build times and code bloat.

Share on:
TwitterFacebookGoogle+Diaspora*HackerNewsEmail
Comments (0)
atom feed: comments

There are no comments yet.

Add a Comment



?

social