親カテゴリー: Database technology

Document based databases

How should we use document based databases?


MarkLogic, MongoDB, couchbase are document databases.

They're often very fast.

But unfortunately compared to relational databases they often do not support efficient joins.


投票 (不必要) (通知しない) (不必要)




How to build a document based database that supports Joins?

A common weakness of document databases is that they often do not support efficient joins. They represent document records as an opaque blob.

I created the attached project to talk of how I plan to implement document based storage using keyvalue storage as the backend. Each key of the document is a separate key and they are arranged so that range scans are efficient so that hash joins can take place efficiently.

    : Bassxn2
    :  -- 
    :  -- 




JSON(または辞書、ハッシュマップ)のレコードとして生データがあるとしましょう。その場合、懸念されるのは効率的なクエリです。これは、インデックス作成(クエリ最適化、またはクエリアルゴリズム)の対象です。 SQLデータベースは定期的にElasticSearchにインデックス付けされます。これは、SQLデータベースが、ユーザーが気にする他の方法でのテキスト検索に適していないか、柔軟性がないためです。別のデータを使用します。システム、それはそれが得意であり、そこにデータのコピーを保持します。あまりスペースを節約しませんが、機能します。 NoSQLでも同じことができます。結合のようなクエリが必要な場合は、SQLデータベースをその場で解釈して移行し、SQLデータベースと連携して補完的なジョブとして機能する特殊なプロセスによって、データをSQLデータベースに「インデックス付け」するだけです。 NoSQL、常に新しいフィールドを探し、補完的なSQLデータベースにそれらのフィールドを作成します。確かに、一度に多くのデータベースを使用することは洗練されたソリューションではないので、ドキュメントベースのデータベースの改善が必要であることに同意します。結局のところ、スキーマは存在しないわけではなく、すべてのレコードはある種のスキーマを意味し、十分な数のレコードが特定のフィールドを共有する場合、新しいSQLフィールドまたは外部キーの作成を正当化する可能性があります。特定のタイプの例を十分に多く見ると、新しい「物理法則」を実現する脳のように考えてください...

The keyword is "efficient". Efficiency is inversely proportional to computational complexity, and so, I assume, you look for new algorithms for joins with unstructured data.

First, the problem is already solved in SQL databases, right? Why not to take a look at the implementation, and take it from there?

Let's say we have raw data as records of JSON (or dictionaries, hashmaps). What you're concerned about then is efficient querying, which is a subject of indexing (query-optimizing, or query algorithms). We routinely index SQL databases into ElasticSearch, because SQL databases are not good or not flexible enough in text search in other ways that users care about: we use another data system, that is good at it, and keep a copy of data in there. Not very space-saving, but works. We could do the same with NoSQL -- if you need join-like queries -- just "index" data into SQL databases, by specialized processes, that interprets and migrates SQL database on the fly, working as a complementary job in concert with the NoSQL, always looking for new fields, and creating those fields in the complementary SQL database. Sure, using many databases at once is not an elegant solution, so I agree, that we need improvement of document based databases. After all, schemas are not non-existent, every record implies a schema some sort, and when sufficiently many records share certain fields, it may justify creation of new SQL field or foreign key. Think of it like a brain that realizes new "laws of physics" when one sees sufficiently many examples of a specific type...



このJSON {{ 「名前」:「サミュエル・スクワイア」、

"仕事": {

"currentJob":{"会社": {"employeeCount":2500}}



{"_id": "1"、



{"name": "God"}、{"name": "databases"}、{"name": "multicomputer systems"}

] } 少なくとも次のkeyvalueオブジェクトに変換されます

0.0="サミュエルスクワイア" = "2500"




"typepeople。*": "list"、

"typepeople。*。0": "string"、

"typepeople。*。1": "list"、

"typepeople。。。。0": "string"、

"typepeople。。1。": "object"、

"typepeople。*。2": "object"、

"type people。*。2.0": "object"、

"typepeople。*。2.0.0": "object"、

"type people。*。": "number"、

"fieldpeople。*": "LIST"、

"fieldpeople。。1。": "LIST"、

"fieldpeople。*。0": "name"、

"fieldpeople。*。1": "趣味"、

"fieldpeople。。。。0": "name"、

"fieldpeople。*。2": "job"、

"fieldpeople。*。2.0": "currentJob"、

"fieldpeople。*。2.0.0": "company"、

"fieldpeople。*。": "employeeCount"、


"fieldpeople。*": "LIST"、

"fieldpeople。*。3": "words"、

"fieldpeople。。3。": "LIST"、

"fieldpeople。。3。。*": "LIST"、

"fieldpeople。。3。": "LIST"、

"typepeople。*。3": "list"、

"typepeople。。3。": "list"、

"typepeople。。3。。*": "list"、

"typepeople。。3。": "list"、

"typepeople。。3。。*": "number"

I've designed a keyspace for JSON that is fast to decode back into JSON and is fast to scan in a RocksDB keyvalue database range scan.

This lets us do a regular hash join as a relational database does.

This JSONs { "name": "Samuel Squire",

"job": {

"currentJob": {"company": {"employeeCount": 2500}}



{"_id": "1",

"name": "Samuel Squire",

"hobbies": [

{"name": "God"}, {"name": "databases"}, {"name":"multicomputer systems"}

] } Is turned into at least the following keyvalue objects

0.0 = "Samuel Squire" = "2500"

0.0 = "Samuel Squire" = "God" = "databases" = "multicomputer systems"

Essentially form a flat structure of the document with keys.

"type people": "object",

"type people.*": "list",

"type people.*.0": "string",

"type people.*.1": "list",

"type people..1..0": "string",

"type people..1.": "object",

"type people.*.2": "object",

"type people.*.2.0": "object",

"type people.*.2.0.0": "object",

"type people.*.": "number",

"field people.*": "LIST",

"field people..1.": "LIST",

"field people.*.0": "name",

"field people.*.1": "hobbies",

"field people..1..0": "name",

"field people.*.2": "job",

"field people.*.2.0": "currentJob",

"field people.*.2.0.0": "company",

"field people.*.": "employeeCount",

"field people": "people",

"field people.*": "LIST",

"field people.*.3": "words",

"field people..3.": "LIST",

"field people..3..*":"LIST",

"field people..3...": "LIST",

"type people.*.3": "list",

"type people..3.": "list",

"type people..3..*": "list",

"type people..3...": "list",

"type people..3....*": "number"

    : Mindey
    :  -- 
    :  -- 




Your idea of synchronizing a SQL database with a document store is similar to my thought of synchronizing a SQL database with dyanamodb which is a fast keyvalue store.

I want the best of NoSQL performance but the power of SQL joins.

    : Mindey
    :  -- 
    :  --