Will Blog

Nov 04

A game update

A couple nights ago I got the UDP server working. I took the code from another project I worked on (encrypted UDP client/server), broke apart all the modules that made up legacy of a warlord, finally took the code from the tile memory manager project and squished them all together into 2 parts server/client.

The server currently handles:

The client handles:

Moving forward now by adding in more modules from legacy of a warlord for the server. Next is the XML files for entities (mobs.xml). I decided I will split this file in half. Tiles/Ascii displayed will be controlled by the client. All the AI/attributes will be stored in the server.

I hope to have a release of the client up by Christmas and a server running somewhere on the internet. I think this time I will also try to have a linux release of the client. I really respect that community and like that OS.

I use Mercurial

I’ve used subversion and tortoise SVN (as the client) to control my source code. I find it absolutely necessary when I break something to be able to diff my code and figure out what was the wrong path.

When LIBTCOD maintainers changed to mercurial I decided to do the same. I am very well pleased with the change myself. It was easy enough to import my SVN into the mercurial server and Tortoise HG makes it a snap to compare, merge and diff versions.

Oct 09

Favorite STL container types c++

In the past week of coding/testing/profiling. I touched a few of the different STL container types.

Here is what I think about them:

1. STD::Vector. My favorite. Much like a standard c array, yet with a lot of methods attached to it. It is very quick. The ability to reserve space makes it absolutely effective in reducing memory bloat also saves me from having to manage the malloc/calloc/free.

2. STD::TR1::Unordered_Map. I use this on the HashNodesClass (branch object) storage, with only 40k - 100k keys it does a great job and is pretty fast. Seems to do a pretty good job managing it’s memory bloat and is pretty quick on lookups. I originally tried to use this as containers for tiles but wasn’t fast enough and the bloat was almost as bad as a std::map.

3. STD::Queue. I was playing with the idea of adding a cache method inside of the HashNodesClass class. Yet after a day of programming and testing, I found just using the random access method HashNodesClass::get was still quicker then going through the caching logic (even compared to a best case scenario of 100% cache hit rate on sequential reads). The checking of Queue.empty(), Queue.pop() and destructor of a std::pair<bool, unsigned int> all added up to 2 seconds for 20 million calls. Queue is pretty nice for its FIFO. I think if I were to play with caching again, I would try a STD::DeQueue (double ended queue).

4. Std::Map. For it’s ease of use and automatic clean up,indexing, sorting, it is the dull swiss army knife of C++ type storage. It’s great for anything that doesn’t require a LOT of data in it and require fast lookups. I learned a useful tidbit to avoid. Nested maps are insanely slow on deconstruction.

In closing, of the C++ STL types, the vector speed wins (if you can effectively access the index). Map is easier to manage. Unordered_map(hash_map) seems to fall squarely in the middle. Every tool has its purpose and I now have a slightly better understanding of them.

Oct 08

Previous posts picture. Full size. An OpenGL raw pixel display visual of my new c++ sparse data storage class.

Previous posts picture. Full size. An OpenGL raw pixel display visual of my new c++ sparse data storage class.

Finally, testing complete

I’m pretty satisfied with the results. I’m ready to move forward now that my sparse tile storage class is complete. I even took a few days to have some fun programming OpenGL on SDL. Attached is a picture of the pixel dump into a texture then displayed via a call to glTexSubImage2d.  

As you can see there is no visual loss of data from the original picture. This is a visual representation of the data stored in HashNodes class. A pixel represents either a root node, branch node or uncompressed tile.

Both of the below test applications used an abstract function (GetMapTile) to call an underlying data structure. I made sure both were using stack assigned variables.

Maybe I should make a lib out of it and post it on github or something.

gprof results

std::map
name: GetMapTile(unsigned int, unsigned int)
[rank]time%    self    children    called
[2]    79.2     0.12    10.08        20,160,000       

Core search lookup implementation
name: std::map<unsigned int,tilemap::Tile>(…)::find()
[rank]time%    self    children    called
[4]    77.2    0.03    9.92        20,160,000

            VS.

Will’s hash algorithm (WHA).
name: GetMapTile(unsigned int, unsigned int)
[rank]time%    self    children    called 
[4]     42.0      0.33      6.14    20,160,000        

Core search lookup implementation
name: BinarySearch(std::vector<unsigned int, std::allocator<unsigned int> > const&, int, int, unsigned int)
[rank]time%    self    children    called(+ recursive)
[9]    18.2      2.52    0.58      11390744 (+75491000)

Total Speed:  
std::map (10.20 seconds) / WHA ( 6.47 seconds ) = WHA 40% faster.

Memory storage:

***         Compression Test Results      ***

Control struture: (std::map)
  Size dimension: x=8400 y=2400
  Total tile count:20160000
  Size (MB):230
  Map overhead (MB):400 (had to observe in process explorer)
  Total memory (MB): 630

Branches structure: (WHA)
  Size dimension (both x&y):20
  Branches count:40470
  Total tile count:6676135
  Branches overhead (KB):2529 
  Size (MB):76
  Total Size (MB):79 

Total Compression/Memory efficiency = WHA has 87% smaller memory use.

Oct 02

1st data structure profile

Not as bad as I thought. Looking at the profiled data the biggest consumer of time is my UN-optimized use of SDL. The last entry on this list is the most promising that I am on the right track, 27,000,000 calls and tile retrievals from the data structure only took 0.12 of a second.

  %   cumulative   self         self      total          
 time   seconds   seconds    calls  ms/call  ms/call  name <Comment>

35.00      6.43      6.43       UNK                         inflateMark <SDL/DLL>

17.15      9.58     3.15     UNK                            inflateUndermine <SDL/DLL>

13.12     11.99     2.41    UNK                            pqdownheap <SDL/DLL>

3.48     12.63     0.64     UNK                        UM::Branch::GetSize()<stats info>

2.99     13.18     0.55 109967520  0.00    0.00  __mingw_pformat <sloppy>

2.78     13.69     0.51    UNK     __static_initialization_and_destruction_0(int, int)

2.78     14.20     0.51    UNK                              inflateCopy <SDL/DLL/Zlib?>

2.50     14.66     0.46 69600015    0.00    0.00  __gdtoa <too many types>

1.47     14.93     0.27 103947170  0.00    0.00  void std::vector<unsigned int*, std::allocator<unsigned int*> >::_M_insert_aux<unsigned int*>(__gnu_cxx::__normal_iterator<unsigned int**, std::vector<unsigned int*, std::allocator<unsigned int*> > >, unsigned int*&&) <creating the worldmap

0.63     16.32     0.12 27569449     0.00     0.00  UM::Branch::GetTile(unsigned char, unsigned char) <WOOT! Getting a tile is QUICK>

There are a few 0.5 second entries I can reduce with optimizing (I was sloppy). 

The hash storage class

I figured I’d share the storage class I’m using now. Little bit more overhead (2mb more overall) for 6 million tiles. Yet look ups are quicker. Total tiles size in memory is 86 mb (including bloat).

This class has a derived logic class as a higher layer in the program to act as an intermediary between data and SDL pixel drawing functions.

// HashNodeClass.hpp

class HashNodes
{
    private:
        Sint32 p2b;
        Uint32 count;
        Sint32 idlookup;
        void resize(void);
        Sint32 const getIndex( Uint32 const & hashlookup  );

        std::vector<Tile> TileEntry;
        std::vector<Uint32> Hash; // find on hash, get o(1) lookup on tile.
        std::vector<Sint8> HasData; // -1 delete, 0 empty, 1 yes

    public:
        HashNodes();
        virtual ~HashNodes();

        bool insert(Tile const & tl,Uint32 const & x , Uint32 const & y );
        bool remove (Uint32 const & hashlookup);
        Uint32 Count(void);
        Tile const * get (Uint32 const & hashlookup) ;
        void SortHash(void);
};

High level summary:

1. On creation all 3 vectors are initialized to a power of 2 amount of entries in memory (128 to start with).
2. Insert pushes data onto the stack into each of the 3 vectors simultaneously. These entries are on a parallel index (VERY IMPORTANT). So, Tile[0] will always line up with Hash[0] value.

3. Get() performs a binary search lookup on the sorted array against the hash value vector. When it finds a value that matches the 1d location of the tile requested, it returns the index of where the data was stored. Now the program performs an o(1) constant time look up to retrieve the tile and the hasdata.

4. Remove() sets values for deletion on the next sort.

5. SortHash() resorts all 3 vectors based on the hash values and erases any indexes that HasData equals -1.

6. Resize() - When the index amount on an insert surpasses pow(2,p2b) , it resizes all 3 arrays in memory to the next power of 2. Starting at 128 then 256 then 512. Etc.. For faster lookup on the binary searches.

Ran using this data structure last night. It takes about 30 seconds to load all 20,000,000 (2400x8400 map) into the same sized RGBA SDL surface for display.

I’m sure I have some more performance tweaks to make.

Then onto testing if this is faster then using a std::map.

I&#8217;m Close. I wrote up an SDL app to display the data retrieved from the compressed data stored in the tile map class.
All the water tiles are now lava. Oops. I think this might be an issue with my lack of experience with SDL and I&#8217;m reversing the colors displayed but the data is coming out fine. I bet that would be a hard core Terraria world to play. All that lava.
Soon as I am absolutely sure I am retrieving the data properly I&#8217;ll move onto hard core testing. Then post some code. There were a couple zone blocks to the left of the picture that didn&#8217;t come out of the bit bucket. I think they were compressed into NULL.

I’m Close. I wrote up an SDL app to display the data retrieved from the compressed data stored in the tile map class.

All the water tiles are now lava. Oops. I think this might be an issue with my lack of experience with SDL and I’m reversing the colors displayed but the data is coming out fine. I bet that would be a hard core Terraria world to play. All that lava.

Soon as I am absolutely sure I am retrieving the data properly I’ll move onto hard core testing. Then post some code. There were a couple zone blocks to the left of the picture that didn’t come out of the bit bucket. I think they were compressed into NULL.

Sep 30

Screenshot of a Terraria tile based zone (2400x8400)
I have  spent the past few days designing a tile engine. I want BIG, I mean  maybe an earth sized world to run around on. &#8230; But how is that even  possible in a rogue like, you might wonder.
Roguelike worlds have  space defined as tiles or cubes. I think of a tile being roughly 3 sq  ft (or however you want to interpret it). I just wouldn&#8217;t have enough  ram for a 41,849,280 by 41,717,280 tile grid (earth sized). Just the  thought of my windows paging all that space to my hard drive, I think it  would probably melt the hard drive and be done loading a zone in 1  month.
To tackle this problem I decided develop an algorithm and class that does a few things.
Stores or retrieves tiles from a compact hashed 1-d array.
Aggregates tile data based on the most common tile found in the branch.
Erases and consolidates branches to the root node (which is the most common tile in the world structure)
Re-Aggregates the branch if it has a new most common tile.
Here is a screenshot of a Terraria tile based world I made and imported into a std::map(2400,8400) structure with 20160000 tiles. The map is a test control. It uses about 230 mb + appx 400mb more ram. The +400 is all the map pointer/variable/poor memory allocation/etc&#8230; overhead from windows.
Picture: http://www.legacyofawarlord.com/images/world1.png (big picture)
Then, I ran the Tile Aggregator/compressor/sparse memory storage class against it (here is a visualized result)  Picture: http://www.legacyofawarlord.com/images/cmpout.png
Legend:  black = tiles not stored in the branch red = branch has no data, and was removed. dot @ 0,0 of branch. color tile = data that was not aggregated onto the branch root.
stdout results:
Generating test control structure:  Control structure generated. Determining map root tile (most used):    Compression root set to tile &lt;16777215&gt;   Generating branches and loading data from image: &#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230; - complete Saving compressed results picture to cmpout.png   Image save complete. ***         Compression Test Results      *** Control struture:   Size dimension: x=8400 y=2400   Total tile count:20160000   Size (MB):230
Branches structure:  Size dimension (both x&amp;y):20  Branches count:40230  Total tile count:6703849  Branches overhead (KB):1885  Size (MB):77  Total Size (MB):79Total Compression = 65%
The Total compression number is skewed. It doesn&#8217;t take into account windows bloated memory on the map structure by 400 mb.
So,  with that in place, I see no reason I couldn&#8217;t generate a 41,849,280 by  41,717,280 tile grid. Just take me awhile to populate it. It&#8217;ll be a lot of tweaks and maybe sub branches before I get it that large.
If you are a developer, you are probably thinking&#8230; wow that  must be slow on lookups. My first thought too, then again, 2 - o(1)  lookups and 2 variable comparisons should be much faster then, tile =  map[3000,8000]. Each lookup on that map takes about 7 steps ( o(log n) ).
I&#8217;ll be profiling it after I make a large world and throw a couple million random inserts and retrieval test passes against it. Then test the same against a std::map lookup. I can&#8217;t wait to see those results!

Screenshot of a Terraria tile based zone (2400x8400)

I have spent the past few days designing a tile engine. I want BIG, I mean maybe an earth sized world to run around on. … But how is that even possible in a rogue like, you might wonder.

Roguelike worlds have space defined as tiles or cubes. I think of a tile being roughly 3 sq ft (or however you want to interpret it). I just wouldn’t have enough ram for a 41,849,280 by 41,717,280 tile grid (earth sized). Just the thought of my windows paging all that space to my hard drive, I think it would probably melt the hard drive and be done loading a zone in 1 month.

To tackle this problem I decided develop an algorithm and class that does a few things.

Here is a screenshot of a Terraria tile based world I made and imported into a std::map(2400,8400) structure with 20160000 tiles. The map is a test control. It uses about 230 mb + appx 400mb more ram. The +400 is all the map pointer/variable/poor memory allocation/etc… overhead from windows.

Picture: http://www.legacyofawarlord.com/images/world1.png (big picture)

Then, I ran the Tile Aggregator/compressor/sparse memory storage class against it (here is a visualized result)
Picture: http://www.legacyofawarlord.com/images/cmpout.png

Legend:
black = tiles not stored in the branch
red = branch has no data, and was removed. dot @ 0,0 of branch.
color tile = data that was not aggregated onto the branch root.

stdout results:

Generating test control structure:
Control structure generated.
Determining map root tile (most used):
  Compression root set to tile <16777215>
  Generating branches and loading data from image:
…………………………………………… - complete
Saving compressed results picture to cmpout.png
  Image save complete.

***         Compression Test Results      ***

Control struture:
  Size dimension: x=8400 y=2400
  Total tile count:20160000
  Size (MB):230

Branches structure:
  Size dimension (both x&y):20
  Branches count:40230
  Total tile count:6703849
  Branches overhead (KB):1885
  Size (MB):77
  Total Size (MB):79

Total Compression = 65%

The Total compression number is skewed. It doesn’t take into account windows bloated memory on the map structure by 400 mb.

So, with that in place, I see no reason I couldn’t generate a 41,849,280 by 41,717,280 tile grid. Just take me awhile to populate it. It’ll be a lot of tweaks and maybe sub branches before I get it that large.

If you are a developer, you are probably thinking… wow that must be slow on lookups. My first thought too, then again, 2 - o(1) lookups and 2 variable comparisons should be much faster then, tile = map[3000,8000]. Each lookup on that map takes about 7 steps ( o(log n) ).

I’ll be profiling it after I make a large world and throw a couple million random inserts and retrieval test passes against it. Then test the same against a std::map lookup. I can’t wait to see those results!

Sep 13

Ideas on the games future

I have branched some of the code to test an idea… It has something to do with my recent exploits into UDP programming. Maybe implemented by 0.60.

Still have some core item/display work to iron out (bug hunting).

Also really, really want to move into sparse data structure storage for legacy of a warlord instead of storing every single cell. I have an idea for storing regions then only track the cells that are different.