I saw this video today about where innovation and good ideas come from.
urllib3seems to be a long-abandoned project on PyPI. However, it has some features (like re-using connections, aka HTTP Keep-Alive) that are not present in the Python 2 version of
urllib2. Another package that provides HTTP Keep-Alive is httplib2.
Benchmark results on a single host
Keep-Alive can significantly speed up your scraper or API client if you’re connecting to a single host, or a small set of hosts. This example shows the times spent downloading random pages from a single host, using both
urllib2 vs. urllib3 benchmark results
The benchmark script
Here’s a script that will benchmark
urllib3for the domain
theoatmeal.con, and write out the results to a CSV files (easy to importy to Google Docs Spreadsheet and generate a nice chart).
If you run it, it will also prent the result summary, something like this:
Starting urllib2/urllib3 benchmark... * crawling: http://theoatmeal.com/ * crawling: http://theoatmeal.com/comics/party_gorilla * crawling: http://theoatmeal.com/comics/slinky * crawling: http://theoatmeal.com/blog/floss ... Finishing benchmark, writing results to file `results.cvs` Total times: * urllib2: 183.593553543 * urllib3: 95.9748189449
As you can see,
urllib3appears to be twice as fast as
Today I asked a question on StackOverflow on how to attach a function to the browser’s DOM ready event, in a cross-browser way, but without exporting any globals (keeping everything in an anonymous function’s closure) and without including any external file. As a result, with some help of a friendly StackOverflow user, I put together a code snippet that:
- takes a single function as argument,
- attaches that function to the DOM ready event in all browsers supported by jQuery,
- is idempotent (will never fire the given function twice),
- does not export any globals,
- compiles down to less than 590 bytes (less than 300 bytes gzipped),
- is based on the jQuery source code (I take no credit for it).
I take no credit for writing this script. If you want to use it, please include jQuery’s license comment.
Here is a CoffeeScript version:
Thisblog, for example, shows the static content as soon as possible, allowing its visitors to read the main content (the article) while the not-so-important content (like Facebook Like buttons, “Web 2.0” widgets and all that crap) is on its way from the server.
I use the following snippet to load external JS:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
I put all that stuff in a closure so nothing gets exported to to the global namespace. Note that the
setTimeouttrick is from here.
- You can load less.js too, just trigger a
less.refresh()after it has been loaded.
- Have a look at Richard Neil Ilagan’s implementation as well.
- You can load less.js too, just trigger a
The former is actually an ongoing project, involving many interesting technologies, such as working with the eBay API, extending the Django admin interface, geocoding (and reverse geocoding), CMS, domain and subdomain management, etc. The other project is a small app that works based on the users’ locaiton.
As a result of these two weeks, Sproud Ventures UG (the company that sponsored the event) will open-source a python package for doing geological lookups and other useful things in web applications. The package contains a raw WSGI middleware for doing IP-based, keyword-based and coordinate-based lookups. Other handy features include Django template tags and a template context processor. I’ll write about it in details when it gets released (that is, when I find some time to improve test coverage and review the documentation.)
- LESS, a very neat tool for writing structured, object-like CSS. Lets you define your template colours in a separate library, import it and use in other styles, use variables, basic arithmetic, and then compiles everything into a valid, nicely-formatted CSS file. I’ll definitely use it in my future projects.
- Always use
twod.wsgiwhen working with Django. Makes life much easier.
- Use even more third party tools. There’s so many great libraryes out there. There are a lot of crappy ones too, but some of them can be improved. +1 for publicly forking projects on GitHub and BitBucket.
- Use YAML, even more.
A few days ago one of my customers asked me to put together a very simple Python script that would search through a text and find all the words that contain both letters and numbers. As simple as it may be, I’ve decided to make it a little more robust by wrapping it in a class that can be configured to split words and check patterns for each word.
This is just too simple to be released as a module, but I’ll put the code here in case anyone needs it. I’m placing it in the public domain.