{
"$type": "site.standard.document",
"description": "This week I went to Berlin for 2 days to attend the MongoBerlin conference organized by 10gen, the creators of MongoDB. I took a lot of notes from the presentations, and I figured this may be useful for someone if they missed a presentation (or just couldn’t go to the conference).\n\nFeel free to correct me if I got something wrong :)\n\n“BRAINREPUBLIC: A high-scale web-application using MongoDB and RabbitMQ”\n\nMongoDB vs CouchDB:\n\nCouchDB - slower (because it uses a HTTP API), easier replication\nMongoDB - faster\nthey needed high performance, so they chose MongoDB\n\nMongo has a rich query API that lets you do more than Couch’s map/reduce\nthey use RabbitMQ, Solr, Varnish proxies and Heartbeat\nMongoDB is a swiss knife of NoSQL, very fast, reliable and very easy to use\nNoSQL doesn’t mean you can be lazy about proper application and database design\ncurrent problems with MongoDB:\n\nquite slow replication speed (someone from the audience said that replica sets are fast)\nindexes can be very slow if they don’t fit into memory\nsimple authentication - sometimes too simple\nmap/reduce is single-threaded (locks the machine)\n\n“MongoDB Internals: How It Works”\n\nwhat does an insert do:\n\nconstructs Object ID (includes timestamp, machine/process ID, counter); two clients at the same time can’t generate the same ID, and IDs always increase with time, so order by id = order by time\nconverts data to BSON\nsends a TCP message (using MongoDB Wire Protocol)\n\nquery:\n\nreturned as a cursor: first n results + cursor id, and then you can get next results using the cursor (n depends on the size of documents)\ncontrary to relational databases, cursors are not associated with sockets, and can be shared between connections; server cleans up cursors periodically if you don’t use them (this gives more flexibility in connection management on the client side)\n\nauthentication - should rarely be required, it’s recommended to protect the database at the network level (using firewalls)\nquery optimizer:\n\nit’s empirical, i.e. at first it tries all possible ways to get the results, and then remembers which one works best (it runs all algorithms in parallel and finishes as soon as one of them finishes), then reuses that knowledge in future requests\nif the selected algorithm becomes very slow, it tries all possible ways again\nso first time a query is called, it might be quite slow\non the other hand, if something changes later, e.g. an index becomes slow, Mongo will work around that\n\ncommands:\n\nreturn short values as results, not actual data (there’s a limit on how much data you can return from a command)\ninternally, they’re performed as finds on $cmd collection\nall commands can be invoked via runCommand with the name as one of parameters\n\nsafe mode:\n\nnormally, commands return as soon as the request is written to the socket\nif you want to wait until it arrives on the other side and is executed, you use safe mode\nit uses getlasterror command, which waits until the operation is executed, and returns an…",
"path": "/2010/10/10/notes-from-the-mongoberlin-conference/",
"publishedAt": "2010-10-10T20:52:17Z",
"site": "at://did:plc:oio4hkxaop4ao4wz2pp3f4cr/site.standard.publication/3mn5mackuba26",
"tags": [
"Databases"
],
"textContent": "This week I went to Berlin for 2 days to attend the MongoBerlin conference organized by 10gen, the creators of MongoDB. I took a lot of notes from the presentations, and I figured this may be useful for someone if they missed a presentation (or just couldn’t go to the conference).\n\nFeel free to correct me if I got something wrong :)\n\n“BRAINREPUBLIC: A high-scale web-application using MongoDB and RabbitMQ”\n\nMongoDB vs CouchDB:\n\nCouchDB - slower (because it uses a HTTP API), easier replication\nMongoDB - faster\nthey needed high performance, so they chose MongoDB\n\nMongo has a rich query API that lets you do more than Couch’s map/reduce\nthey use RabbitMQ, Solr, Varnish proxies and Heartbeat\nMongoDB is a swiss knife of NoSQL, very fast, reliable and very easy to use\nNoSQL doesn’t mean you can be lazy about proper application and database design\ncurrent problems with MongoDB:\n\nquite slow replication speed (someone from the audience said that replica sets are fast)\nindexes can be very slow if they don’t fit into memory\nsimple authentication - sometimes too simple\nmap/reduce is single-threaded (locks the machine)\n\n“MongoDB Internals: How It Works”\n\nwhat does an insert do:\n\nconstructs Object ID (includes timestamp, machine/process ID, counter); two clients at the same time can’t generate the same ID, and IDs always increase with time, so order by id = order by time\nconverts data to BSON\nsends a TCP message (using MongoDB Wire Protocol)\n\nquery:\n\nreturned as a cursor: first n results + cursor id, and then you can get next results using the cursor (n depends on the size of documents)\ncontrary to relational databases, cursors are not associated with sockets, and can be shared between connections; server cleans up cursors periodically if you don’t use them (this gives more flexibility in connection management on the client side)\n\nauthentication - should rarely be required, it’s recommended to protect the database at the network level (using firewalls)\nquery optimizer:\n\nit’s empirical, i.e. at first it tries all possible ways to get the results, and then remembers which one works best (it runs all algorithms in parallel and finishes as soon as one of them finishes), then reuses that knowledge in future requests\nif the selected algorithm becomes very slow, it tries all possible ways again\nso first time a query is called, it might be quite slow\non the other hand, if something changes later, e.g. an index becomes slow, Mongo will work around that\n\ncommands:\n\nreturn short values as results, not actual data (there’s a limit on how much data you can return from a command)\ninternally, they’re performed as finds on $cmd collection\nall commands can be invoked via runCommand with the name as one of parameters\n\nsafe mode:\n\nnormally, commands return as soon as the request is written to the socket\nif you want to wait until it arrives on the other side and is executed, you use safe mode\nit uses getlasterror command, which waits until the operation is executed, and returns an error if there was an error, and nothing if the operation succeeded\ngetpreviouserror - returns last error that actually happened (not necessarily from last request)\n\ndatabase files:\n\ndata and indexes are stored in foo.0, foo.1 etc.\nmetadata about namespaces (collections) is stored in foo.ns\ndatabase files are preallocated (so don’t go crazy if the files get big - they may be mostly empty)\none set of files per database, all in one directory (there’s an option to put them in separate directories)\n\n$freelist namespace - list of areas that can be reused (because data was deleted)\ndocument record contains:\n\nheader (includes: size, links to previous/next, info about how fast the record grows to estimate how much padding is needed)\nBSON data\npadding\nit’s good to design schema so that the document size doesn’t change very often, because then the server has to move the data often\n\ncapped collection:\n\ncollection with a limited size\nif it hits the limit, it deletes the oldest documents\nuseful for logs\n\n“Indexing and Query Optimizer”\n\nit’s often difficult to guess up front what indexes are needed, you need to know your actual, real query patterns\na collection may have 64 indexes, but a query can only use one at a time\nindexes aren’t free - they add linear cost to all inserts and updates\nfor compound indexes, order of keys is important\nthe indexer is smart enough to handle cases where a field is a number in some documents, and string in others, and optimizer will use that (for arrays, it adds one entry for each element to the index)\nif you query by regexp, it can use the index on that parameter, but only if it’s a left-anchored regexp (/…/)\ngeospatial search is a special case, it can only use indexes and won’t run without them\nin what queries indexes can’t be used:\n\nnegations\n$mod\nmost regexps\nJavaScript calls ($where, map/reduce)\nin general, if you can imagine how an index is used, it will be, and if you can’t, it won’t be\n\ndb.collection.find(…).explain - explain what exactly the query does:\n\ncursor:\n\nBtreeCursor - means it uses an index\nBasicCursor - table scan, doesn’t use an index\n\nnscanned - number of documents scanned\nif something is slow, and you think it has an index, check this first\nmake sure this isn’t the initial query run, where it tries all possible ways to run a query\n\nquery profiler - logs all queries slower than a given limit to the log\n\nhow to pick a limit: get some statistics of query run times, if you see two maximums on the graph and a minimum between them, pick that minimum\n\nif you really want to force Mongo to use an index:\n\nfind(…).hint({ x: 1 }) to use an index on x\nfind(…).hint({ $natural: 1 }) to disable indexes\nthis should almost never be necessary…\n\n“Sharding Internals”\n\nwhen to use sharding - when you:\n\nhave too much data to fit on one box\nhave too big indexes to fit into memory on one box\nwant to divide write load on many boxes to make writes faster\nwant to divide reads on many boxes and keep consistency (if you don’t need full consistency, just add more slaves)\n\nauto balancing:\n\nthe shards will automatically rebalance themselves in the background with no downtime when you add more nodes\nyou can also add sharding seamlessly without downtime to non-sharded master/slave or replica set; everything is consistent as if it was a single server\nthis won’t work if you want to change the sharding key on existing shards - too much data to move\n\nit’s up to you to decide how to partition the data, because it all depends on the kind of data you have\nexamples:\n\nuser profiles - shard by some kind of unique id (so lookup by id will go to single shard; lookup by other things, e.g. location, will have to check all shards)\nuser activity, timeline - shard by user id, so user’s timeline will load from a single shard\nphotos, like flickr - shard by photo id, because very often you’ll know the photo id\nlogging:\n\nyou could shard by date, but then all currently written logs will go to a single server and will put all load on that single server\nyou can divide by source machine, but then searching for all recent logs will have to access all shards\nyou could also divide by log type (e.g. mysql logs, apache logs)\n\nin general, for writes it’s good if they’re divided across shards, and for reads it’s good if they read from one shard\n\nalways think what exactly is the bottleneck (data, speed), why do you want to shard, and how the system is going to be used\nshard config servers:\n\nthey store all metadata about shards\nthey’re very important - if all of them go down, it’s very difficult to restore the database (not impossible, but very time consuming)\nthey use two-phase commits\nyou should have at least 3 of them\nif one goes down, metadata becomes temporarily read only, so you can’t add/modify shards during that time (but you can still add data)\nthey require very little resources, so can be installed on existing machines\n\neach shard is a master/slave pair or a replica set (or just a master, but that’s a bad idea)\nmongos - sharding router:\n\nacts as a frontend of the databases for the application\nclients can’t tell the difference between a regular mongo database and a sharding router, they access it in exactly the same way\nredirects all requests to the shards (using the metadata from a config server)\nyou can have as many as you want\none solution is to have one on every application server - then there’s no extra network load, because every application server accesses its own mongos on the same machine\n\nchunk migrating:\n\nmoving chunks (parts of a collection) to a different shard\nduring the copy, the origin keeps a log of all operations, because clients can still do writes in the meantime\nmigration is finished when the data is copied and all new operations are applied (when the target catches up with the changes on the source)\n\nbalancing: server automatically detects when a shard has too much data and moves some chunks to other shards\nusually only one or a few biggest collections are sharded, and the non-sharded data is kept on one shard which is automatically selected by the system as primary\n\nFoursquare example: only checkins collection is sharded, users and locations aren’t\n\nnumber of physical machines for sharding:\n\nusual number of machines initially used for sharding is 6 or 9\nbut you can start with 2 or 3 (one box can hold master for one shard and slave for another)\n\n“Replication Internals”\n\noplog:\n\nit’s a capped collection that stores recent operations on master\nit should be large enough to hold all operations for some period of time - how much exactly depends on the speed of adding new data\non master/slave setups oplog is kept only on master, and on replica sets - on all nodes\n\nsyncing: if slave is empty, it first copies all data from master, then applies all the changes from oplog that happened in the meantime\nsome operations are converted when added to oplog, to guarantee that they are idempotent:\n\nincs are translated to updates with $set\ndeletes are divided to one per record (because it won’t be possible to match the same set of records again)\n\nno-op operations are added to oplog in regular intervals to ensure that it isn’t paged out to disk\n–fastsync\n\nassumes that data on the slave is identical to the master data\nyou can use that if you copy all the data manually, e.g. by sending whole drives to another data center\nFedEx delivers replicated data at about 11 Mb/s ;)\n\nwith capped collections, it’s possible to do a ‘blocking read’ that will wait until new data is available (like tail -f)\nreplica pair: something old that was replaced by replica sets, don’t use that\nreplica sets:\n\ninstead of master and slave, we have primary and secondary, because they’re dynamic\nwhen primary is down, for a short time writes are blocked, and remaining nodes have an election process to find out which one has the newest data, and that node becomes a new primary\nsome really new data, that wasn’t written to any secondary nodes, may be lost\nwhen old primary reconnects, it marks all changes that didn’t make it into the new primary as invalid and logs them somewhere\nthere may be some problems with determining if the primary is dead or just inaccessible - may be solved by setting vote strengths, adding extra nodes, or adding arbiters (nodes that don’t store any data and only take part in the election process)\n\n“Scaling with MongoDB”\n\noften you can change performance by orders of magnitude only by rethinking the schema\nembedding:\n\nit’s great for read performance if you need to read the main object and all embedded objects together (there’s one request to load all objects)\nbut writes can be slow if you’re adding embedded objects all the time, because very often the document will have to be moved somewhere else on the disk\nif the updates to embedded objects are e.g. 1-2 per second then it will copy the data all the time\n\nindexing: if you have an index on (A, B) then you don’t need an index on A\nsometimes you need to think hard if having an index will bring better performance than not having it - because huge indexes that swap out constantly are very bad for performance\nright balanced indexes:\n\nindex is right balanced if all the new elements go to the right side (when they’re sorted by date or id)\nin such index, usually only the right side of the index needs to be kept in memory, even if the whole index is huge\nit might sometimes make sense to add a new key just to have a right balanced index\nuse this for data that is mostly accessed only for time, e.g. recent photos, messages, etc.\n\nworking set: you need to figure out how long a user stays on the site and how much data they’re going to need during that time (and how many users come simultaneously)\nhorizontal scaling:\n\nusually read scaling is most important, because most people just read the data, and only some of them modify it\nsimplest way: master + one or more slaves; slaves' data may be inconsistent at times\nfor data size and write scaling: add shards\ntypical setup for sharding: 9 servers (3 masters, 6 slaves)\nfor sharding, choose such keys that are more or less evenly distributed in the collection\n\nLightning talks:\n\n“Survival Guide: How to select the right database”\n\nhttp://nosql-database.org - about all nosql databases\nhttp://www.slideshare.net/jboner/scalability-availability-stability-patterns\n\n“Map/reduce”\n\nmap/reduce is useful e.g. for aggregating data from entire database in a batch task\nit can be thought of as a distributed grep command whose results are piped to another command that processes (reduces) the data\n\n“Indexing Plugins”\n\nif built-in Mongo indexing doesn’t satisfy your needs, write your own indexer\nyou can add new commands that will be called with runCommand\nright now you can’t use data outside of mongodb for indexing\n\n“MongoDB Roadmap”\n\nfeatures planned in 1.8:\n\nsingle server durability\nimproved replication\nfull text search",
"title": "Notes from the MongoBerlin conference",
"updatedAt": "2025-06-30T01:49:16Z"
}