Summary of "Big Brains, Big Data, and Design to Test Cloud Services at Scale"
Twitter summary of a meet up talk http://www.meetup.com/Peking-Hackers/events/204436932/ Unit and functional tests alone aren’t enough to test large cloud services at scale. Cloud services not only need monitoring, but they also need continual testing. We will explore use of big data and deep technical understanding to verify a cloud service is working on each and every request. This big data puzzle is filled with lots of ambiguity, very messy and heterogeneous data, and a dynamic and changing datacenter and codebase. Obstacles include how to gather all the data we need to make product decisions, respect privacy, and not drown in useless data. Chris Mitchell is a Principal Development Engineer for Microsoft Developer Center Norway "Big brains, big data" with Chris Mitchell from MSFT at #pekinghackers pic.twitter.com/mOiE11I2Cz — Kristofer Palmvik (@KPalmvik) September 22, 2014 Log everything. Logging an error after it occurred is too late. Manually turning on logging is out of question. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Telemetry is hard to get right. Need a mix of schema and textual debug output. Hard to save in a database with terabytes of data a day. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Use map-reduce to handle log data spread out over multiple nodes, instead of one huge database. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Use map-reduce to handle log data spread out over multiple nodes, instead of one huge database. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Parallelization of data operations can reduce the needed processing time from years to seconds. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Latency and timeouts are tricky to estimate and set. Usually a lot of wasted time. Log it and analyze. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Visualize the log entries to identify bottlenecks and problems. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Every time the service changes (upgrades, patches) look at the log for changes and patterns to identify problems — Kristofer Palmvik (@KPalmvik) September 22, 2014 Store log data for a short period. Process and save the insights, throw away the rest. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Try to understand the data. Data is the new currency. Statistics is fundamental, and working knowledge is essential. — Kristofer Palmvik (@KPalmvik) September 22, 2014 Logging about the logging, to avoid too expensive log generation. — Kristofer Palmvik (@KPalmvik) September 22, 2014