Optymalizator

2017/03/05

Why 4k is the optimal sector size for modern USB flash drives

Everyone has this special day in a year when you receive gifts (they call it birthday but I'm too old already to celebrate. Old enough to try to forgot that earth make yet another full round around the sun and your head become more "blond" than you checked last time ;)

So I got SanDisk 64GB USB flash drive which is advertised to give 100MB/s of transfer speed. It uses USB 3.0 so I decided to give it a try and verify those claims. Another reason was my brother (who gave me this) "taunts" that "Windows handle it better" ;)

So I made simple "dd" test, with RAW read and write without any file system.

 # sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"  
 # dd of=/dev/sdc if=/dev/zero bs=64M oflag=dsync,direct iflag=fullblock & echo $!; while sudo kill -SIGUSR1 $!; do sleep 1; done  
 [1] 22786  
 22786  
 0+0 records in  
 0+0 records out  
 0 bytes copied, 0,00882205 s, 0,0 kB/s  
 1+0 records in  
 1+0 records out  
 67108864 bytes (67 MB, 64 MiB) copied, 1,82117 s, 36,8 MB/s  
 2+0 records in  
 2+0 records out  
 134217728 bytes (134 MB, 128 MiB) copied, 3,39514 s, 39,5 MB/s  
 3+0 records in  
 3+0 records out  
 201326592 bytes (201 MB, 192 MiB) copied, 5,00308 s, 40,2 MB/s  
 4+0 records in  
 4+0 records out  
 268435456 bytes (268 MB, 256 MiB) copied, 6,58802 s, 40,7 MB/s  
 5+0 records in  
 5+0 records out  
 335544320 bytes (336 MB, 320 MiB) copied, 8,37488 s, 40,1 MB/s  
 # killall dd

 # sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"  
 # dd if=/dev/sdc of=/dev/null bs=64M iflag=fullblock,dsync,direct & echo $!; while sudo kill -  
 [2] 31336  
 31336  
 1+0 records in  
 0+0 records out  
 0 bytes copied, 0,57976 s, 0,0 kB/s  
 2+0 records in  
 1+0 records out  
 67108864 bytes (67 MB, 64 MiB) copied, 1,13625 s, 59,1 MB/s  
 4+0 records in  
 3+0 records out  
 201326592 bytes (201 MB, 192 MiB) copied, 2,25124 s, 89,4 MB/s  
 6+0 records in  
 5+0 records out  
 335544320 bytes (336 MB, 320 MiB) copied, 3,36255 s, 99,8 MB/s  
 8+0 records in  
 7+0 records out  
 469762048 bytes (470 MB, 448 MiB) copied, 4,47168 s, 105 MB/s  
 10+0 records in  
 9+0 records out  
 603979776 bytes (604 MB, 576 MiB) copied, 5,58709 s, 108 MB/s  
 11+0 records in  
 10+0 records out  
 671088640 bytes (671 MB, 640 MiB) copied, 6,14454 s, 109 MB/s  
 13+0 records in  
 12+0 records out  
 805306368 bytes (805 MB, 768 MiB) copied, 7,25453 s, 111 MB/s  
 15+0 records in  
 14+0 records out  
 939524096 bytes (940 MB, 896 MiB) copied, 8,3586 s, 112 MB/s  
 17+0 records in  
 16+0 records out  
 1073741824 bytes (1,1 GB, 1,0 GiB) copied, 9,46021 s, 114 MB/s

Looks like ~40MB/s for write and ~100MB/s for read. So .... yes, this is as advertised by SanDisk. You just need to look at those little stars placed near the numbers. You will find their meaning at the other side of the box. They claim 100MB/s for reading and up to 10x times of standard drive speed (4MB/s) for writing.

But I wonder how well my system behave when I would use this drive together with some file system. The most obvious choice is VFAT.
mkfs is your best friend. Let's use default parameters then.

 # mkfs.vfat /dev/sdc1  
 mkfs.fat 3.0.28 (2015-05-16)  
 # fsck.vfat /dev/sdc1 -v  
 fsck.fat 3.0.28 (2015-05-16)  
 Checking we can access the last sector of the filesystem  
 Boot sector contents:  
 System ID "mkfs.fat"  
 Media byte 0xf8 (hard disk)  
 512 bytes per logical sector  
 32768 bytes per cluster  
 32 reserved sectors  
 First FAT starts at byte 16384 (sector 32)  
 2 FATs, 32 bit entries  
 7580160 bytes per FAT (= 14805 sectors)  
 Root directory start at cluster 2 (arbitrary size)  
 Data area starts at byte 15176704 (sector 29642)  
 1894928 data clusters (62093000704 bytes)  
 32 sectors/track, 64 heads  
 2048 hidden sectors  
 121305055 sectors total  
 Checking for unused clusters.  
 Checking free cluster summary.  
 /dev/sdc1: 0 files, 1/1894928 clusters   
 # cd /media/rad/837C-93FB/  
 # dd of=./test1.txt if=/dev/zero bs=32M oflag=dsync mount=1000 &amp; echo $!; while sudo kill -SIGUSR1 $!; do sleep 1; done  
 [1] 9897  
 9897  
 0+0 records in  
 0+0 records out  
 0 bytes copied, 0,0124351 s, 0,0 kB/s  
 0+1 records in  
 0+1 records out  
 18821120 bytes (19 MB, 18 MiB) copied, 1,0552 s, 17,8 MB/s  
 1+1 records in  
 1+1 records out  
 52375552 bytes (52 MB, 50 MiB) copied, 2,47131 s, 21,2 MB/s  
 2+1 records in  
 2+1 records out  
 85929984 bytes (86 MB, 82 MiB) copied, 4,14265 s, 20,7 MB/s  
 3+1 records in  
 3+1 records out  
 119484416 bytes (119 MB, 114 MiB) copied, 5,55161 s, 21,5 MB/s  
 4+1 records in  
 4+1 records out  
 153038848 bytes (153 MB, 146 MiB) copied, 7,0338 s, 21,8 MB/s  
 5+1 records in  
 5+1 records out  
 186593280 bytes (187 MB, 178 MiB) copied, 8,69431 s, 21,5 MB/s  
 6+1 records in  
 6+1 records out  
 220147712 bytes (220 MB, 210 MiB) copied, 10,1471 s, 21,7 MB/s

What the f... ? ~21MB/s !!!!
So after a while I realized that this is probably due default sector size. Lets try to reformat for 4k sector size:

 # mkfs.vfat /dev/sdc1 -S 4096 -s 1  
 mkfs.fat 3.0.28 (2015-05-16)  
 # fsck.vfat /dev/sdc1 -v  
 fsck.fat 3.0.28 (2015-05-16)  
 Checking we can access the last sector of the filesystem  
 Boot sector contents:  
 System ID "mkfs.fat"  
 Media byte 0xf8 (hard disk)  
 4096 bytes per logical sector  
 4096 bytes per cluster  
 32 reserved sectors  
 First FAT starts at byte 131072 (sector 32)  
 2 FATs, 32 bit entries  
 60534784 bytes per FAT (= 14779 sectors)  
 Root directory start at cluster 2 (arbitrary size)  
 Data area starts at byte 121200640 (sector 29590)  
 15133546 data clusters (61987004416 bytes)  
 32 sectors/track, 64 heads  
 2048 hidden sectors  
 15163136 sectors total  
 Checking for unused clusters.  
 Checking free cluster summary.  
 /dev/sdc1: 0 files, 1/15133546 clusters  
 root@rad-desktop:/media/rad# cd  
 0AC2-20C2/ dysk/  
 root@rad-desktop:/media/rad# cd 0AC2-20C2/  
 root@rad-desktop:/media/rad/0AC2-20C2# dd of=./test.txt if=/dev/zero bs=64M oflag=dsync,direct iflag=fullblock & echo $!; while sudo kill -SIGUSR1 $!; do sleep 1; done  
 [1] 21689  
 21689  
 0+0 records in  
 0+0 records out  
 0 bytes copied, 0,0138234 s, 0,0 kB/s  
 1+0 records in  
 1+0 records out  
 67108864 bytes (67 MB, 64 MiB) copied, 2,02883 s, 33,1 MB/s  
 1+0 records in  
 1+0 records out  
 67108864 bytes (67 MB, 64 MiB) copied, 2,03412 s, 33,0 MB/s  
 2+0 records in  
 2+0 records out  
 134217728 bytes (134 MB, 128 MiB) copied, 3,77909 s, 35,5 MB/s  
 3+0 records in  
 3+0 records out  
 201326592 bytes (201 MB, 192 MiB) copied, 5,97208 s, 33,7 MB/s  
 4+0 records in  
 4+0 records out  
 268435456 bytes (268 MB, 256 MiB) copied, 7,93711 s, 33,8 MB/s  
 5+0 records in  
 5+0 records out  
 335544320 bytes (336 MB, 320 MiB) copied, 9,74674 s, 34,4 MB/s  
 6+0 records in  
 6+0 records out  
 402653184 bytes (403 MB, 384 MiB) copied, 11,8841 s, 33,9 MB/s  
 7+0 records in  
 7+0 records out  
 469762048 bytes (470 MB, 448 MiB) copied, 13,8475 s, 33,9 MB/s  
 ^C

35MB/s ... hmmm. So the reason for this is native sector size of the flash drive. It seems that the device advertise 512B as native sector size but it internally use 4k. Because of this each write operation need to be combined in the firmware of the drive which degrades the performance.
I was to lazy to check, what the mkfs.vfat tools would do if drive would report 4k sectors. I suppose that it should use bigger FAT sectors but that's only a wild guess (or rather a wish).

Now, you probably wonder... How many of wrongly formatted flash drives are in your possession? ;)

2016/11/10

Mapping device memory with mmap and MAP_PRIVATE on /dev/mem

From time to time you hit into something, which proves that even the old dog can learn new things.
Did you ever need to mmap() the device memory?

I recently work on one of the implementations of ODP (Dataplane framework similar to DPDK) and naturally I need to mmap of device or physical address space into userspace process.

Mapping of PCI BAR0, simple stuff right?

addr = mmap (NULL, BAR0_SIZE, PROT_READ, MAP_PRIVATE, mem_fd, BAR0_OFFSET);

so why I use MAP_PRIVATE and what is wrong with it?

Usually when you map piece of memory and you do not want to share the memory page with other processes, you use MAP_PRIVATE. It's fully legitimate with typical memory as when in conjunction with PROT_READ, kernel use copy-on-write and allocate the page when it is first time referenced.

Wait... what I just said? It is copied ...
So remember, never use MAP_PRIVATE when mapping PCI BARS, because it will COPY the BAR memory. In other words references to this mapped memory will not reach the PCI. You will not operate on device address space anymore, instead you will operate on copy of PCI address space inside of RAM.

2015/09/29

Why WaitForMultipleObjects() is bad and why useful things seems to get lost in dark corners of internet much easier than pictures of funny cats

Some time ago, I made a small research of embedded RTOS. I also published my personal opinion about them in my second blog.

Recently I'm preparing a newspaper article and just wanted to recall some interesting point of view why Microsoft Windows WaitForMultipleObjects() functions family is a bad idea. And guess what ... I spend almost hour to find this post in the internet ... ughh

So since things like to get lost in dark corners o internet ... I insolently decided to copy-paste this post :D ... the original post can be (as time of writing this note) found here.

Author of following post is by Kaz Kylhe Mon, 19 Apr 1999 04:00:00
Original thread name: "pthreads and waiting for MULTIPLE conditions"
Site: http://www.verycomputer.com/5_c6341d98f68da45e_1.htm

Andy Levin was asking for:

I am porting some NT work I wrote a while back to the UNIX world and have the need to use pthreads.
Is there any functionality in the pthread libraries (or elsewhere) that allows you to wait for MULTIPLE conditions to occur.

Kaz Kylhe replays:

This is not needed. Think about it. You have one thread and you want to
put it to sleep. In a typical OS kernel, all processes or threads sleep
in one ``place'' at a time, such as a supend queue and whatnot.

What you are asking for is like wanting to sleep in two or more beds
at the same time. This, of course, is only possible if you are the president of
the United States.

> I am looking for functionality similar to the Win32 function WaitForMultipleObjects()

That Win32 function is seriously misconceived and should be avoided even in
Win32 programming. It has problems. For example, in the case that any *one* of
the objects can wake up the thread, there is potential starvation by one object
that is constantly going off, since the function only reports one object during
one call. (Contrast this to the much cleverer POSIX select() function that
gives you a bitmap representing objects that are ready, thus avoiding
starvation.) Another problem is that this function does not scale beyond 64
objects. If you need to wait for more events or what have you, you have to
launch additional threads. For example, if you wanted to wait for 256 objects
in Win32, you would launch four threads. Each thread would wait for 64
objects. And your ``parent'' thread would do a WaitForMultipleObjects on the
four thread handles. Ugly, ugly, ugly.

Programs that use WaitForMultipleObjects often break encapsulation; you often
see a technique whereby software modules export events, in effect saying ``here
is my event, I will set it when something interesting happens''. Needless
to say, this is hard to port.

A much better approach is to keep events buried in the implementation of
something; notification can be provided much more efficiently by direct message
passing. One object should not know about the internal events and threads of
another object.

For example, suppose that you have a thread that needs to wait until some other
object completes some task, or a timer goes off. One way to do this under Win32
might be to have two events. Your thread waits on both of them using
WaitForMultipleObjects. The events are exported to other objects (such as the
timer or the worker) which do a SetEvent. In other words, the use of Win32
events is made explicit in the interfaces between objects, a clearly
portability mistake.

A better way to do this might be to have registered callbacks. When the timer
goes off, a callback is invoked back to the waiting object. When the worker
object is done doing something, a different callback is invoked. Both
callbacks can signal the same Win32 event, so your thread just needs to wait on
that single one. This model is readily supported by POSIX condition variables,
and is easier to port among different threading platforms, because it is based
on a message passing abstraction that is directly supported by C and C++,
namely function calls.

>allows you to specify an array of synchronization objects and conditions wait for all, wait for any, etc) to wait for.

There is no etc: you can wait for all, or you can wait for any (with
potential starvation, requiring you to constantly permute the array
passed to WaitForMultipleObjects).

2014/01/03

CPU architecture

http://ootbcomp.com/docs/belt/

2012/01/24

History

Today with my several colleagues we recall old times of Commodore 64 and such stuff. During the discussion I mentioned that I had a mentor on my university and he sometimes told me such stories (which were obviously quite older), like about times of punched cards and Odra computers.

It is funny that some of those stories continue in present times in form of joke or connotations:

POKE and POKE cheat or Killer POKE
Halt and catch fire
or even such common a BUG (first bug is even stored in museum ;))

2012/01/02

Microsoft CEO

Microsoft CEO - No comments :D

2011/12/27

Usefull C macros

By couple of years of professional programming, I have collected some useful C macros which I use on daily basis. I don’t imagine programming without them. I figure out that not many people use such macros, or even if they use similar ones, their version is vulnerable for macro parameter side effects and signed overflow hacks.

Because my macros are much better ;) I decided to share. Some of them are my pure creation, some of them are copied for other places (like Linux kernel). Because of that I am obliged to release them under GPL (I hope that nobody will sue me for 4 line of C code even if he/she came onto the same obvious idea).

Some of those macros are just renaming for more robust code when you plan to migrate between different compilers. Some other are useful during C kind object-like programming. At least some of them save some repeat coding ;)

/**
* Macro mask the "unused parameter" warning when using -Wall -Wextra compiler
* options. Those compiler options should be used to prevent typical C coding
* mistakes while unused macro allows to silent the waring when we really what
* to not use all of function parameters
*/#if defined(__GNUC__)
# define unused(x) UNUSED_ ## x __attribute__((unused))
#elif defined(__LCLINT__)
# define unused(x) /*@unused@*/ x
#else
# define unused(x) x
#endif

/** likely/unlikely macro is to provide the compiler with branch prediction
* information. By explicitly giving such info you may instruct compiler
* to produce code optimized for more probable use case. As result compiler
* will move the code sections for full utilization of CPU instruction cache
* and jump elimination (CPU pipeline flush).
*/
#if __GNUC__ >= 3
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#else
#warning Compiler is not supporting all extensions, some optimization will have no effect
#define likely(x) (x)
#define unlikely(x) (x)
#endif

/** always_inline tells GCC to inline the specified function regardless of whether optimization is enabled.
* deprecated tells you when a function has been depreciated and should no longer be used. If you attempt to
use a deprecated function, you receive a warning. You can also apply this attribute to types and

variables to encourage developers to wean themselves from those kernel assets.
* __used__ tells the compiler that this function is used regardless of whether GCC finds instances of calls

to the function. This can be useful in cases where C functions are called from assembly.
* __const__ tells the compiler that a particular function has no state (that is, it uses the arguments

passed in to generate a result to return).
* warn_unused_result forces the compiler to check that all callers check the result of the function. This

ensures that callers are properly validating the function result so that they can handle the appropriate

errors. */
#if __GNUC__ >= 3
#define __always_inline__ __attribute__((always_inline))
#define __deprecated __attribute__((deprecated))
#define __attribute_used__ __attribute__((__used__))
#define __attribute_const__ __attribute__((__const__))
#define __must_check __attribute__((warn_unused_result))
#else
#define __inline__
#define __deprecated
#define __attribute_used__
#define __attribute_const__
#define __must_check
#endif

/**
* Prefetch portable macros for cache preload by buildin uC instructions.
* This macro allows to produce more optimized code on CPU equipped with cache.
* Placing prefetch instruction before code section that use some data may
* influence on execution time, since procesor will load those data to the
* cache before the instructions request them. Without prefetch, data
* load will be done when instructions require mentioned data, while CPU will
* be staled until those data will be loaded to cache.
*/
#if __GNUC__ >= 3
#define prefetch(x) __builtin_prefetch(x)
#else
#define prefetch(x)
#endif

/**
* Restrict keyword can be used to variables which are pointers and which points
* to content not accessible by other pointers or parameters.
* The main purpose of this keyword is to instruct the compiler that content
* under the particular pointer will not be changed by other way than explicit
* pointer reference. This allows the compiler to produce more optimized code.
*
* Without restrict keyword, compiler have to assume that any two pointers may
* point to the same memory location. Because of that compiler cannot reuse
* temporary values since it may be invalidated by any other indirect (pointer)
* write access.
*/
#if __GNUC__ >= 3
#define restrict __restrict__
#else
#define restrict
#endif

/** \note gcc-4.0.1/gcc/Return-Address.html On some machines it may be
* impossible to determine he return address of any function other than the
* current one; in such cases, or when the top of the stack has been reached,
* this function will return 0 or a random value. In addition,
* __builtin_frame_address may be used to determine if the top of the stack
* has been reached, for compatibility our macro calleraddr returns only the
* return addres of current function */
#if __GNUC__ >= 3
#define calleraddr() __builtin_return_address(0)
#else
#define calleraddr() NULL
#endif

/**
* Common macro that allows to calculate the offset of field in structure
*
* @param _type Type of parent
* @param _member Name of member inside parent
*
* @return Offset in bytes (size_t) of member from the beginning of parent.
*/
#ifdef __compiler_offsetof
#define offsetof(_type,_member) __compiler_offsetof(_type, _member)
#else
#define offsetof(_type, _member) ((size_t) &(((_type *)NULL)->_member))
#endif

/** Common macro that allows to get size of member in structure or union */
#define sizeoffield(_type, _member) (sizeof(((_type *)NULL)->_member))

/**
* Common macro that allows to calculate pointer to parent,
* from pointer to member, name of the member inside parent and parent object type
* Using of temporary pointer _mptr is necessary to prevent macro side effects for
* operands like pointer++
*
* @param _prt Pointer to member
* @param _type Parent object type
* @param _member Name of the member inside parent object
*
* @return Pointer to parent
*/
#define container_of(_ptr, _type, _member) ({ \
const typeof( ((_type *)0)->_member ) *_mptr = (_ptr); \
(_type *)( (char *)_mptr - offsetof(_type,_member) );})

/*
* Common min macro with strict type-checking,
* returns the smaller value from two operands
*
* Strict type checking in important aspect of secure code,
* (sign type mixed checking is common source of exploitable bugs).
* Using of temporary values is necessary to prevent macro side effects for
* operands like variable++
*
* @param _x First value
* @param _y Second value
*
* @return Smaller of two passed values
*/
#define min(_x, _y) ({ \
typeof(_x) _min1 = (_x); \
typeof(_y) _min2 = (_y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })

/*
* Common max macro with strict type-checking,
* returns the greater value from two operands
*
* Strict type checking in important aspect of secure code,
* (sign type mixed checking is common source of exploitable bugs).
* Using of temporary values is necessary to prevent macro side effects for
* operands like variable++
*
* @param _x First value
* @param _y Second value
*
* @return Greater of two passed values
*/
#define max(_x, _y) ({ \
typeof(_x) _max1 = (_x); \
typeof(_y) _max2 = (_y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })

/**
* Macro used to calculate the ceiling(x/y), macro is type sensitive.
* Macro cannot be used with floating point types,
* for those please use ceil function from math.h
*
* @param _x dividend
* @param _y divisor
*
* @return ceil(_x/_y) with the same type as _x operand
*/
#define ceil_div(_x, _y) ({ \
typeof(_x) __x = (_x); \
typeof(_y) __y = (_y); \
(void) (&__x == &__y); \
typeof(_x) _rem = __x % __y; \
typeof(_x) _div = __x / __y; \
(_rem > 0) ? (_div + 1) : _div; })

/**
* Version of ceil_div without strict type checking, use with care
* Warning this version has common macro side effects for operands which use
* ++ or --
*/
#define ceil_div_nocheck(x, y) (((x) / (y)) + (((x) % (y)) ? 1 : 0))

/**
* Version of min without strict type checking, use with care
* Warning this version has common macro side effects for operands which use
* ++ or --
*/
#define min_nocheck(x, y) (((x) < (y)) ? (x) : (y))

/**
* Version of cmax without strict type checking, use with care
* Warning this version has common macro side effects for operands which use
* ++ or --
*/
#define max_nocheck(x, y) (((x) > (y)) ? (x) : (y))

/**
* Macro clamps the value to the given range, macro performs strict type checking.
*
* Strict type checking in important aspect of secure code,
* (sign type mixed checking is common source of exploitable bugs).
* Using of temporary values is necessary to prevent macro side effects for
* operands like variable++
*
* @param _val Clamped value
* @param _min Lower bound
* @param _max Upper boud
* @return Clamped value
*/
#define clamp(_val, _min, _max) ({ \
typeof(_val) __val = (_val); \
typeof(_min) __min = (_min); \
typeof(_max) __max = (_max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })

/**
* Macro clamps the value to the given range using val param type
*
* This macro does no type checking and uses temporary variables of whatever
* type the input argument 'val' is. This is useful when val is an unsigned
* type and min and max are literals that will otherwise be assigned a signed
* integer type.
*
* @param _val Clamped value
* @param _min Lower bound
* @param _max Upper boud
* @return Clamped value
*/
#define clamp_val(_val, _min, _max) ({ \
typeof(_val) __val = (_val); \
typeof(_val) __min = (_min); \
typeof(_val) __max = (_max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })

/**
* Macro returns number of table elements
*
* @param _table Table pinter
* @return Number of table elements, the return type is size_t
*/
#define element_cnt(_table) (sizeof((_table)) / sizeof((_table)[0]))

/**
* Macro rounds up the address to platform basic type alligment rules.
* As an example on 32 bit platforms it will return address aligned up to 32 bits.
*
* @param _addr Unaligned address
* @return Aligned address
*/
#define addr_allign(_addr) \
( (void*)(divtop_nocheck((unsigned long)(_addr), sizeof(long)) * sizeof(long)) )

/**
* Macro rounds up the size to platform basic type alligment rules.
* As an example on 32 bit platforms it will return size aligned up to 32 bits.
*
* @param _addr Unaligned size
* @return Aligned size
*/
#define size_allign(_size) \
( (size_t)(divtop_nocheck((size_t)(_size), sizeof(long)) * sizeof(long)) )

/**
* \brief Malloc with out of memory protection and memory cleaning
*/
#define safe_alloc(_type) ({ \
_type *ptr = malloc(sizeof(_type)); \
assert(NULL != ptr); \
memset(ptr, 0, sizeof(_type)); \
ptr; })

/**
* \brief free with "reference after free" protection
*/
#define safe_free(_ptrtype) \
do { \
memset(_ptrtype, 0xAB, sizeof(*(_ptrtype))); /* 0xAB is a safe patern */ \
free(_ptrtype); \
(_ptrtype) = NULL;\
} while(0)

/**
* Compile-time assertion useful for check done during the compilation time.
* It provides similar functionality to assert macro (which is calculated
* during the execution time). static_assert is evaluated during the compilation
* stage (not during the preprocessing). Because of that it may be used with
* conjunction to cost variables or C expresions like sizeof(type).
*
* \remarks It relies on 'unused' macro and assumes that 'unused' macro
* adds "UNUSED_" prefix to name. It might generate "unused variable"
* warning in case of disabled 'unused' macro.
*/
#define static_assert(expr) \
static const char unused(unique_name [(expr)?1:-1]) = {'!'}
#define unique_name make_name(__LINE__)
#define make_name(line) make_name2(line)
#define make_name2(line) constraint_ ## line