Forever tape abstraction


With computers we often have different places to place data that determines how fast or how large the data is. For example, we have the stack, main memory, virtual memory, page files, disk files, buffers, heap, S3, SANs, NFS. Can we create an abstraction that captures the differences of each of these kinds of storage and allows efficient management of each of them and switching between them so that there is an illusion that memory is infinite but efficient. For example, you might want to load chunks from S3 efficiently into main memory and cached on disk.


Data warehouses work with petabytes of data but requires custom coding to work.

Databases and the Linux kernel do clever things with pages to move memory in and out of main memory. I feel this code could be applicable to application Level programming.

For example, I want to be capable of writing a script that works with a 100kb file and a 100 petabyte file in reasonable time. It can do this by managing memory and spreading out processing.

I think this is the responsibility of a language to solve these problems once and solve them well. I shouldn't need custom coding.

I could write my program to use a streaming approach rather than use memory.

I could use mmap to map the file and rely on the operating system to manage the loading chunks of data into memory.


(通知しない) (不必要) ログインしてください。