пятница, 8 марта 2024 г.

Profiling shared libraries on linux

Disclaimer: proposed approach uses dirty hacks & patches and tested on x86_64 only so use it at your own risk. Also no chatGPT or some another Artificial Idiots were used for this research

Lets assume that we have shared library (for example R extension or python module) and we want to know where and why it spending many hours and consuming megawatts of electricity. There is even semi-official way to do this:

  1. compile shared library with -pg option
  2. set envvar LD_PROFILE_OUTPUT to directory where you want to store profiling data
  3. set envvar LD_PROFILE to filename of library to profile
  4. run your program. Well, sounds that you need lots of things to do before this step and you can`t set up profiling dynamically
  5. run sprof on profiling log

Unfortunately this method just don`t work - sprof fails with cryptic message
Inconsistency detected by ld.so: dl-open.c: 890: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

Seems that this long lived bug known since 2017 and still not fixed
Lets try to discover some more reliable way and start with inspection of code generated for profiling

воскресенье, 14 января 2024 г.

failed attempts to draw graphs

CSES has several really hard graph-related tasks, for example

It would be a good idea to visualize those graphs. One of well-known tool to do this is Graphviz, so I wrote simple perl script to render graph from CSES plain text into their DSL. On small graphs all goes well and we can enjoy with something like

But seems that on big graphs with 200k nodes dot just can`t finish rendering and after ~2 hours of hard work met with OOM killer. Lets think how we can reduce size of graph

четверг, 4 января 2024 г.

Distinct Colors

I`ve solved yet another very funny CSES task - it looks very similar to another task called "Reachable Nodes" (my solution for it). The only difference is that we asked to count not unique nodes but colors of nodes. What can go wrong?

And this is where funny part begins - my patched solution got crashes. gdb didn`t showed nothing interesting. However I remember scary cryptic command to show stack usage:

print (char *)_environ - (char *)$sp
$1 = 8384904

Very close to default 8Mb (check ulimit -s). Wait, WHAT? Do we really have stack exhausting? Lets check - 8 * 1024 * 1024 = 8388608 bytes. Tree can have 200000 nodes. 8388608 / 200000 = ~42 bytes for each recursive DFS call. Seems to be true - in each call we store return address + stack frame RBP + 3 registers holding args (this, indexes of node and parent) - so at least 5 * 8 = 40 bytes. It`s so happened that some tests contain tree with very long stem from root till end, so yes - recursive DFS cannot visit all nodes in such tree. Solution is simple - we can emulate recursion with std::stack. As bonus for all nodes in stack we can use single bit mask to save space

Another unpleasant observation is that trees in tests ain't BINARY trees. When one picture is worth a thousand words:

Degree of node 2 is 4. This is main reason why function dfs has separate branch for processing joint nodes with only 2 descendants - bcs initially method is_fork returned only left and right

Source

воскресенье, 31 декабря 2023 г.

Architecture and Design of the Linux Storage Stack

Not perfect but suitable book considering the small number of books about linux internals. IMHO most useful is chapter 10, so below is brief summary of the presented tools

And I have stupid question - has anyone already merged all this zoo in some cmdlet/package for linux powershell to have common API? At least I was unable to find something similar on powershellgallery

воскресенье, 17 декабря 2023 г.

Filling Trominos

IMHO this is very hard task - only 104 accepted solutions. My solution is here

Google gives lots of links for trominos but they all for totally different task from Euler Project - in our case we have only L-shapes. So lets think about possible algorithm

It`s pretty obvious that we can make 2 x 3 or 3 x 2 rectangles with couple of L-trominos. So naive solution is just to check if one size is divisible by 2 and other by 3

However with pen and paper you can quickly realize that you can for example fill rectangle 5 x 6:

aabaab
abbabb
ccddee
dcdced
ddccdd

Algo can look like (see function check2x3)

  • if one side of rectangle is divisible by 6 then another minus 2 should be divisible by 3
  • if one side of rectangle is divisible by 6 then another minus 3 should be divisible by 2

Submit our solution and from failed tests suddenly discovering that you also can have rectangle 9 x 5. Some details how this happens

So we can have maximal 3 groups of different shapes:

  • 9 x 5 rectangle (or even several if sides multiples of 5 & 9) - in my solution it stored in field has_95
  • 1 or 2 groups of 2 x 3 rectangles below 9 x 5 shape. 1 for case when you can fill this area with shapes 2 x 3 of the same orientation and 2 if you must mix vertical and horizontal rectangles - field trom
  • the same 1 or 2 groups on right of 9 x 5 shape - field right

Now the only remained problem is coloring

Rectangle 9 x 5 has 5 different colors but it is possible to arrange trominos in such way that on borders it will have only 4 colors and 5th is inside. For groups of 2 x 3 rectangles you need 4 colors if group size is 1 and yet 4 if size is 2. In worst case number of colors is 4 for 9 x 5 + 2 * 2 * 4 = 20 - so we can fit in A-Z

воскресенье, 12 ноября 2023 г.

my solutions for couple CSES tasks

CSES has two very similar by description tasks but with completely different solutions: "Critical Cities" (218 accepted solutions at time when I writing this) and "Visiting Cities" (381 accepted solutions)

Critical Cities

We are given an directed unweighted graph and seems that we need to find it`s dominators for example using Lengauer-Tarjan algo (with complexity O((V+E)log(V+E))
Then we could check each vertex in this dominators tree to see if it leads to target node, so overall complexity is O(V * (V+E)log(V+E))
 
This looks not very impressive IMHO. Lets try something completely different (c) Monty Python's Flying Circus. For example we could run wave (also known as Lee algorithm) from source to target and get some path with complexity O(V+E). Note that in worst case this path can contain all vertices. Lets mark all vertices in this path
Next we could continue to run waves but at this time ignoring edges from marked nodes and see what marked vertices are still reachable. For example on some step k we run wave from Vs and reached vertices Vi and Vj. We can conclude that all vertices in early found path between Vs and Vj are NOT critical cities. So we can repeat next step starting with Vj
This process can be repeated in worst case V times so overall complexity is O(V*(V+E))
 
My solution is here

 

Visiting Cities

At this time we are given an directed weighted graph and seems that simplest solution is to find all K-th shortest paths (for example with Yen algo) and make union of their vertices. Because I'm very lazy I decided to reuse some ready and presumably well-tested implementation of this algo. You can read about fabulous results here
 

After that I plunged into long thoughts until I decided to count how many paths of minimal length go through each vertex - actually we could run Dijkstra in both directions: from source to target and from target to source, counting number of paths with minimal length. And then we could select from this path vertices where product of direct counts with reverse equal to direct count on target (or reverse count on source) - it`s pretty obvious that you can`t avoid such vertices in any shortest path. Complexity of this solution is two times from Dijkstra algo (depending from implementation O(V^2) or O(V * log(V) + E * log(V)) using some kind of heap) + in worst case V checks for each vertices in first found shortest path

My solution is here

четверг, 9 ноября 2023 г.

kssp library, part 2

Previous part 

I was struck by the idea of how to reduce size of graph before enumerating all K-th shortest paths. We can use cut-points. By definition if we have cut-point in some shortest path it must be visited always - otherwise you can`t reach the destination. So algo is simple
  1. find with Dijkstra algo first shortest paths for whole graph
  2. find all cut-points in whole graph
  3. iterate over found shortest path - if current vertex is cut-point - we can run brute-force from previous cut-point till current

Results are crazy anyway - on the same test 7

  • Yen: 13.86s, 29787, 3501 cycles
  • Node Classification: 118.4s, 30013, 3501 cycles
  • Postponed Node Classification: 104.95s, 30013, 3501 cycles
  • PNC*: 120.33s, 30013, 3501 cycles
  • Parsimonious Sidetrack Based: 4.66s, 29980, 3501 cycles
  • Parsimonious Sidetrack Based v2: 4.79s, 29980, 3501 cycles
  • Parsimonious Sidetrack Based v3: 4.85s, 29980, 3501 cycles
  • Sidetrack Based: 4.18s, 29980, 3501 cycles
  • Sidetrack Based with update: 4.34s, 29980, 3501 cycles
At this time all algos worked to completion and again gave different results...
Source code