Monday, May 5, 2014

compressing DRAM with ZRAM for fun and profit?

Theory:

Can you use compressed DRAM for science if you don't quite have enough memory?

TL;DR No

I'm going to file this under "Great idea, but my execution is slightly suspect"

Anyway, here's an example set up of compressed swap files:
[root@jcair-vm ~]# modprobe zram

[root@jcair-vm ~]# mkswap /dev/zram0
Setting up swapspace version 1, size = 104860756 KiB
no label, UUID=58476253-ad5a-4595-9bec-60bd09d76d30

[root@jcair-vm ~]# mkswap /dev/zram1
Setting up swapspace version 1, size = 104860756 KiB
no label, UUID=ed5d0f85-0245-472e-902e-0e94a743cbe0

[root@jcair-vm ~]# swapon -p5 /dev/zram0 
[root@jcair-vm ~]# swapon -p5 /dev/zram1

[root@jcair-vm ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       104860752       0       5
/dev/zram1                              partition       104860752       0       5

Clearly without the zram setup above, stress fails right out of the gate:
[root@jcair-vm ~]# stress --vm-bytes 2344600024 -m 2 --vm-keep
stress: info: [6063] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
stress: FAIL: [6063] (415) -- worker 6065 got signal 9
stress: WARN: [6063] (417) now reaping child worker processes
stress: FAIL: [6063] (451) failed run completed in 10s

But, running a stress test with a memory allocation much bigger than the host seems to work just fine and dandy once we have our zram swap files like those noted above:
[root@jcair-vm ~]# stress --vm-bytes 2344600024 -m 2 --vm-keep
stress: info: [5383] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd

top - 10:41:17 up 13 days, 22:03,  4 users,  load average: 2.91, 0.87, 0.29
Tasks: 192 total,   4 running, 188 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us, 74.8%sy,  0.0%ni, 11.5%id,  0.0%wa,  0.1%hi, 13.4%si,  0.0%st
Mem:   3923468k total,  3840852k used,    82616k free,     5368k buffers
Swap: 209721504k total,   626964k used, 209094540k free,    36932k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5385 root      20   0 2242m 1.2g  124 R 96.4 31.3   0:48.12 stress
 5384 root      20   0 2242m 1.2g  124 R 84.0 32.0   0:48.12 stress

Yay! So - this looks like it could work!

And so here we go with a genome aligner to see if this works. This will be a good test as it writes real data structures into memory, stress was doing a block fill. So first up let's try w/o enough ram:
[root@jcair-vm ~]# cat go.sh 
./bowtie2/bowtie2 -x ./hg19 -p 4  <( zcat Sample_cd1m_3rdrun_1_ATCACG.R1.fastq.gz)

[root@jcair-vm ~]# ./go.sh 
Out of memory allocating the ebwt[] array for the Bowtie index.  Please try
again on a computer with more memory.

Error: Encountered internal Bowtie 2 exception (#1)

Command: /root/bowtie2/bowtie2-align-s --wrapper basic-0 -x ./hg19 -p 4 /dev/fd/63 
(ERR): bowtie2-align exited with value 1

Ok, fair enough, so we have a reproducer.

Let's now set up a run with the right amount of physical ram:
[root@jcair-vm ~]# ./bowtie2/bowtie2 -x ./hg19 -p 4 <(cat cuff.fastq) -S out.dat &

7467 root 20 0 3606m 3.3g 1848 S 389.3 58.3 51:37.25 bowtie2-align-s

And we have a result!
[root@jcair-vm ~]# time ./bowtie2/bowtie2 -x ./hg19 -p 4 <(cat cuff.fastq)  -S out.dat 
13558597 reads; of these:
  13558597 (100.00%) were unpaired; of these:
  11697457 (86.27%) aligned 0 times
    545196 (4.02%) aligned exactly 1 time
   1315944 (9.71%) aligned >1 times
13.73% overall alignment rate

Ok, so let's shrink the memory of the machine and see if we can run with zram.

Let's also set the same priority and do a round robin between physical swap and zram so each can write/read a block should be nice balanced I/O. The stress worked, so our theory is that data and in memory structures could compress and we should be able to get at least a 1:2 or 1:1.5 ratio out of the memory, I settled on a 3G machine with a 3G compression and some physical swap also:
[jcuff@jcair-vm ~]$ swapon -s
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       2947008 614124  1
/dev/dm-1                               partition       4063224 614176  1

When running it did result in *much* smaller RES (982m vs 3.3G from native example):

 2350 root      20   0 3606m 982m 1020 S 20.8 33.3  12:26.74 bowtie2-align-s

Things chugged along, but I was not seeing this ending any time soon so I truncated the read file dramatically to ca. 5k reads to see if I could get a quick comparison between, zram hybrid sram and swap, and plain old boring old swap files.

As you can see below, only "boring old swap" resulted in anything sensible. The zram alone caused some rather spectacular OOM errors and obvious system instability, it was kinda fun though. You can also see below various versions we tried out, none of which actually worked, but we are also not totally alone here either.

Oh and: "Just the right amount of memory" - like Goldilocks, that one worked ;-)
Machine with memory too small:          (ERR): bowtie2-align exited

3G zram:                                sshd invoked oom-killer: gfp_mask=0x200da

Hybrid 3G zram + 4G physical swap:      6m 25.285s

Hybrid 500MB zram + 4G physical swap:   1m 51.029s

Regular /dev/dm-1 swap file:            0m 29.741s

Machine with enough ram:                0m 12.698s

In summary... NO PROFIT this time :-(

Still a neat idea - just don't try this at home kids!


[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff