stereochrome

Inklings in December 2011

ScummVM: Datafiles

Some games, like Lure of the Temptress, require extra files to work in ScummVM. This page lists them.

Inside the mind of the octopus

[R]esearchers who study octopuses are convinced that these boneless, alien animals—creatures whose ancestors diverged from the lineage that would lead to ours roughly 500 to 700 million years ago—have developed intelligence, emotions, and individual personalities. Their findings are challenging our understanding of consciousness itself.

Writing Vim plugins

RFC 6455: The WebSocket Protocol

Earley parser

[T]he Earley parser is an algorithm for parsing strings that belong to a given context-free language, named after its inventor, Jay Earley. The algorithm is a chart parser that uses dynamic programming, and is mainly used for parsing in computational linguistics.

Earley parsers are appealing because they can parse all context-free languages, unlike LR parsers and LL parsers which are more typically used in compilers but which can only handle restricted classes of languages. The Earley parser executes in cubic time in the general case O(n3), where n is the length of the parsed string, quadratic time for unambiguous grammars O(n2), and linear time for almost all LR(k) grammars. It performs particularly well when the rules are written left-recursively.

Mozilla: WebAppSec/Secure Coding Guidelines

An Introduction to Asynchronous Programming and Twisted

“Introduction” might be something of an overstatement: this is long, pretty comprehensive, and some of the best documentation on Twisted I’ve seen yet.

SSH Productivity Tips

A decent basic bread recipe

Using Checkinstall With Virtualenv For Python Deployments

Checkinstall is new to me, but it seems like it’d ease some of my deployment issues quite a bit.

Haystack

Helps with integrating the likes of Solr and Xapian into Django websites.

Pykka

The goal of Pykka is to provide easy to use concurrency abstractions for Python by using the actor model. Pykka provides an actor API with two different implementations:

  • ThreadingActor is built on the Python Standard Library’s threading and Queue modules, and has no dependencies outside Python itself. It plays well together with non-actor threads.
  • GeventActor is built on the gevent library. gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of libevent event loop. It is generally faster, but doesn’t like playing with other threads.

Ragel State Machine Compiler

Using SSL in Twisted

A protocol multiplexing daemon I wrote in work is, more than likely, going to be rewritten on top of Twisted (out of necessity to lower its rather massive memory usage more than anything). I’ll need to know this.

List of Yahoo! subnets that send email

Yahoo’s way of handling greylisting is hopelessly broken, so it’d be nice to avoid having email from their servers being greylisted.

Caveat: I’m not sure how up-to-date this list actually is.

List of mail hosts that have problems with greylisting

Along with the Yahoo! list, I need to turn this into a lookup table that Postfix can use to avoid using greylisting with. Similar caveats also apply.

Urwid, a console UI library for Python

mpd + ncmpc - perfect combo for iTunes replacement

I use mpd on my netbook, but I’ve never used ncmpc before. I’ll give it a try, because it looks decent enough.

ldappool: a simple pool for python-ldap

Async: A monadic concurrency library for O'Caml

Tweetstream: Simple Twitter streaming API access

I’ve written something like this in the past myself, but I’ll happily scrap my work if this is good.

Twitter t.co link wrapper FAQ

I need this for a rather cheeky project that’s been on my agenda for a while now.

RFC 684: A Commentary on Procedure Calling as a Network Protocol

Still fresh and relevant, even today.

(If you haven’t realised by now, I’m doing a massive linkdump.)

Introduction to Information Retrieval (PDF download)

This may be useful some day.

Why loading third party scripts asynchronously is not good enough

Loading third party scripts async is key for having high performance web pages, but those scripts still block onload. Take the time to analyze your web performance data and understand if and how those not-so-important content/widgets/ads/tracking codes impact page load times. Maybe you should do what we did on CDN Planet: defer their loading until after onload.

"The IO monad is 45 years old"

Very interesting post. The Landin paper mentioned comes in two parts, and the part cited is actually the second part. The first part is freely available online without having to go through the ACM’s paywall, and is interesting in and of itself, but only marginally relevant to this post.

ztype

A typing tutor Dave mentioned to me before. It’s done as a shoot ‘em up in the vein of Galaxians, more or less. Seems like a good way to learn to type, and it’s fun even if you already know how.

uWSGI

uWSGI is a fast, self-healing and developer/sysadmin-friendly application container server coded in pure C.

Born as a WSGI-only server, over time it has evolved in a complete stack for networked/clustered web applications, implementing message/object passing, caching, RPC and process management.

The Imprinted Brain Theory

In a nutshell: Autism and Psychosis/Schizophrenia are duals of one another. Quite interesting.

URL design

In full agreement with this. It’s been sitting in my browser for ages, and I have to pass it on to others, along with a few others I’ve queued up.

The Twelve-Factor App

In the modern era, software is commonly delivered as a service: called web apps, or software-as-a-service. The twelve-factor app is a methodology for building software-as-a-service apps that:

  • Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
  • Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
  • Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
  • Minimize divergence between development and production, enabling continuous deployment for maximum agility;
  • And can scale up without significant changes to tooling, architecture, or development practices.

The twelve-factor methodology can be applied to apps written in any programming language, and which use any combination of backing services (database, queue, memory cache, etc).

[…]

The Twelve Factors

  1. Codebase: One codebase tracked in revision control, many deploys
  2. Dependencies: Explicitly declare and isolate dependencies
  3. Config: Store config in the environment
  4. Backing Services: Treat backing services as attached resources
  5. Build, release, run: Strictly separate build and run stages
  6. Processes: Execute the app as one or more stateless processes
  7. Port binding: Export services via port binding
  8. Concurrency: Scale out via the process model
  9. Disposability: Maximize robustness with fast startup and graceful shutdown
  10. Dev/prod parity: Keep development, staging, and production as similar as possible
  11. Logs: Treat logs as event streams
  12. Admin processes: Run admin/management tasks as one-off processes

All solid stuff. Go read the site for details. The Hacker News commentary is also good.

Fabric Python with Cleaner API and Parallel Deployment

Is planning essential? And are plans really inconsequential?

Let’s start by examining the fundamental question implied by the title of this post: Is planning really valuable? Or is planning in and of itself worthless because it’s the plan that’s really valuable? If we believe that plans are valuable, planning is the activity of producing plans, and we infer that planning has little or no value independent of the plan it produces. It’s the plan that is valuable, not the planning. We could outsource the planning, or even buy a plan off the shelf and if it was the right plan, we’d be just as successful as if we did the planning ourselves.

Conversely, if we believe that planning is valuable, planning is an activity that involves something more than simply producing a plan. We believe this activity confers some value independently of the plan, and we believe that the plan without the planning is significantly less valuable. Outsourcing planning or buying a plan off the shelf won’t work because we’d lose the value of the planning itself.

[…]

When we think of the plan as a subset of the knowledge obtained through planning, we see it in a different light. It’s like a fossil. It reflects the knowledge, but it isn’t the knowledge. It’s an imprint in sediment outlining the size and shape of the knowledge.

Top-down operator precedence parsing

A good introduction to Pratt parsers.

Git Immersion

To read and to use as a reference.

Fitness linkdump

Guess what one of my resolutions is?

I don’t want to end up buff or anything, just less flabby and to feel better.


And some general fitness links:


And new year resolutions:

Learn you some Erlang for great good!

Fun Erlang tutorial.

A Gentle Introduction to Category Theory - the calculational approach

In these notes we present the important notions from category theory. The intention is to provide a fairly good skill in manipulating with those concepts formally. What you probably will not acquire from these notes is the ability to recognise the concepts in your daily work when that differs from algorithmics, since we give only a few examples and those are taken from algorithmics. For such an ability you need to work through many, very many examples, in diverse fields of applications.

This text differs from most other introductions to category theory in the calculational style of the proofs, the restriction to applications within algorithmics, and the omission of many additional concepts and facts that I consider not helpful in a first introduction to category theory.

Twelve things good bosses believe

  1. I have a flawed and incomplete understanding of what it feels like to work for me.
  2. My success—and that of my people—depends largely on being the master of obvious and mundane things, not on magical, obscure, or breakthrough ideas or methods.
  3. Having ambitious and well-defined goals is important, but it is useless to think about them much. My job is to focus on the small wins that enable my people to make a little progress every day.
  4. One of the most important, and most difficult, parts of my job is to strike the delicate balance between being too assertive and not assertive enough.
  5. My job is to serve as a human shield, to protect my people from external intrusions, distractions, and idiocy of every stripe—and to avoid imposing my own idiocy on them as well.
  6. I strive to be confident enough to convince people that I am in charge, but humble enough to realize that I am often going to be wrong.
  7. I aim to fight as if I am right, and listen as if I am wrong—and to teach my people to do the same thing.
  8. One of the best tests of my leadership—and my organization—is “what happens after people make a mistake?”
  9. Innovation is crucial to every team and organization. So my job is to encourage my people to generate and test all kinds of new ideas. But it is also my job to help them kill off all the bad ideas we generate, and most of the good ideas, too.
  10. Bad is stronger than good. It is more important to eliminate the negative than to accentuate the positive.
  11. How I do things is as important as what I do.
  12. Because I wield power over others, I am at great risk of acting like an insensitive jerk—and not realizing it.

I know I’d fail on many if not all of these, but they’re all properties I’d like to see in any boss of mine.

Sitemaps XML format

37 ways that words can be wrong

Nice compendium of various forms of fallacious reasoning based off of the misuse of words.

kqueue tutorial

Common (MySQL) Queries

There are some nice recipes in there.

Mocking patterns: chained calls, partial mocking and open as context manager

Programming the Roland TB-303

I don’t have one, but it’d be cool if I did!

The Programming Language Zoo

On this page you will find on display a number of mini languages which demonstrate various techniques in design and implementation of programming languages. The languages are implemented in Objective Caml.

[…] The languages are not meant to compete in speed or complexity with their bigger cousins from the real world. On the contrary, they are deliberately very simple, as each language introduces only one or two new basic ideas. You should find the source code useful if you want to learn how things are done.

I stumbled across this when I was looking for information on Levy’s call-by-push-value, but there’s many other interesting toy interpreters on the page too.

Tony Finch - Some notes on Bloom filters

The Refactoring Manifesto

Python Module of the Week: imaplib

imaplib is probably one of the most horrible and inscrutable parts of the Python standard libraries, not least because IMAP itself is horrible and inscrutable, if very, very useful. This will (probably) help me work with it.

IMAPClient

Promises to be less horrible and more pythonic than imaplib. Might be worth a look.

Best practice in Science and Coding. Holding up a mirror.

If we think about what makes science work; effective communication, continual testing and refinement, public criticism of claims and ideas; the things that make up good science, and mean that I had a laptop to write this talk on this morning, that meant the train and taxi I caught actually run, that, more seriously a significant portion of the people in this room did not in fact die in childhood. If we look at these things then we see a very strong correspondence with good practice in software development. High quality and useful documentation is key to good software libraries. You can be as open source as you like but if no-one can understand your code they’re not going to use it. Controls, positive and negative, statistical and analytical are basically unit tests. Critique of any experimental result comes down to asking whether each aspect of the experiment is behaving the way it should, has each process been tested that a standard input gives the expected output. In a very real sense experiment is an API layer we use to interact with the underlying principles of nature.

Good stuff, and it mentions dexy!

How to prepare: checklist for great talks

Pingback 1.0

To implement.

Why Events Are A Bad Idea (for high-concurrency servers)

Event-based programming has been highly touted in recent years as the best way to write highly concurrent applications. Having worked on several of these systems, we now believe this approach to be a mistake. Specifically, we believe that threads can achieve all of the strengths of events, including support for high concurrency, low overhead, and a simple concurrency model. Moreover, we argue that threads allow a simpler and more natural programming style.

We examine the claimed strengths of events over threads and show that the weaknesses of threads are artifacts of specific threading implementations and not inherent to the threading paradigm. As evidence, we present a user-level thread package that scales to 100,000 threads and achieves excellent performance in a web server. We also refine the duality argument of Lauer and Needham, which implies that good implementations of thread systems and event systems will have similar performance. Finally, we argue that compiler support for thread systems is a fruitful area for future research. It is a mistake to attempt high concurrency without help from the compiler, and we discuss several enhancements that are enabled by relatively simple compiler changes.

I’ve always thought this paper had a somewhat deceptive title.

How to rank products based on user input

Why Vector Clocks are Easy

I don’t currently have any need for vector clocks, but there’s a monolithic system in work I’ll be breaking up some time in the new year, and yeah…

Private Methods are a Code Smell

TL;DR: If you’ve private methods in your code, ask yourself if you might be violating the single responsibility principle. If you are, factor them out into a separate class; if you’re not, then why are they hidden?

Test Driven Development and the Meaning of 'Done'

[P]erhaps the most valuable effect of TDD is just a side effect of upfront unit testing: it relieves schedule pressure and allows teams to delay the point at which code can be called done.

The Virtues of Monitoring

Software Development Antipatterns

What we can learn from procrastination

How to write 1000 words

Derailing for Dummies

Eeeevil! :-)

UglifyJS: a JavaScript parser/compressor/beautifier

This package implements a general-purpose JavaScript parser/compressor/beautifier toolkit. It is developed on NodeJS, but it should work on any JavaScript platform supporting the CommonJS module system (and if your platform of choice doesn’t support CommonJS, you can easily implement it, or discard the exports.* lines from UglifyJS sources).

The tokenizer/parser generates an abstract syntax tree from JS code. You can then traverse the AST to learn more about the code, or do various manipulations on it.

Blosc: A blocking, shuffling and loss-less compression library

Transmitting data from memory to CPU (and back) faster than a plain memcpy()

I don’t quite believe the speed claim, but it looks like a decent alternative to lzo.

Standardizing Python WSGI deployment