my blog. for you.

Let’s talk digital.

I’m an independent IT consultant and entrepreneur in the Internet and software business. I’m interested in design, enterprise applications, web apps and SaaS products. I design and develop business solutions and applications. I help companies in terms of software quality and knowledge transfer, e.g. with Angular and Spring Boot.

spaCy – A fast natural language processing library

spaCy is a rather new library (written in Python and Cython) for performing various NLP-related tasks such as tokenization, POS-tagging and syntactic parsing. The authors claim it's faster (in some cases a lot faster) than other common solutions such as NLTK or Stanford's CoreNLP and peer reviews seem to corroborate these claims. spaCy is English-only for now. If you're working on performance-intensive NLP tasks and if that's no deal-breaker for you might want to check it out. The source code is available ... Read more

The Turing Exception by William Hertling

About a year ago I wrote about the "Avogadro Corp" by William Hertling, the first book of his Singularity Series. A week ago I finished reading "The Turing Exception", the fourth and final instalment and a worthy close to the series. The Turing Exception follows common patterns of the series. Avogadro Corp was set in 2015 with its sequels taking place 10 years later each. With The Turing Exception we've now arrived at around 2045 and in a world that in some ... Read more

I don’t want an app for that

Scott Adams of Dilbert fame recently posted an article on why your phone interface is a legacy train wreck. He argues that the way we interact with our smartphones goes all the way back to the beginning of desktop computing. In spite of what Apple probably would have us believe - in spite of swiping, tapping and multitouch - we're still largely using our computing devices as if they were a 1987 IBM PC running Microsoft Word or Excel: When trying ... Read more

Deep Learning for NLP

Richard Socher, Chris Manning and Yoshua Bengio have created a tutorial on "Deep Learning for NLP (without Magic)". The tutorial includes slides and two videos of talks held on the subject. It deals with how deep learning algorithms can be applied in natural language processing. Deep learning is a set of algorithms and models which work under the assumption that observed data is generated from multiple layers of hidden representations that interact with each other. Although not really new and for some ... Read more

Hemingway: Readibility Scores And Smart Suggestions On Style

Hemingway is an interesting new web app that not only assigns a score to the readability of your text but also makes smart suggestions regarding how to simplify your text in order to make it more understandable. Having worked on readability algorithms before I think this is a well-designed take on improving the usefulness of readability scores. Knowing that your text scores high or low in terms of readability only gets you so far. Hemingway additionally uses colour-coding to make suggestions such as: split ... Read more

Closer than you think

I've recently finished reading Avogadro Corp by William Hertling and the sequel A.I. Apocalypse. These books deal with the idea how artificial intelligence might come about today or in the near future. The story's main premise is the eponymous Avogadro Corp, a hardly disguised Google. This company, whose name is conveniently related to a large number as well, offers a wide range of Internet services: Search, web-based office suite, web-based eMail (AvoMail ...) and its own smartphone OS (AvoOS). Sounds familiar? The story ... Read more

Natural Language User Interfaces And Internet Search

Recently, there was an article at Wired about IBM’s Watson and how IBM might be able to supersede Google as the dominant search engine by providing a question-answering kind of search engine. Every few years the idea of a natural language / semantic / question answering search engine crops up again. Indeed, natural language understanding is quite relevant for the crawling and indexing part of information retrieval systems and Google is very good at that. Just look at their quite formidable automatic translation ... Read more

Named Entity Recognition: Tools And APIs

Named entity recognition is a subtask of information extraction. It deals with extracting the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages etc. from unstructured or semi-structured data such as eMails or web pages. These are few useful tools and APIs that provide named entity recognition functionality: AlchemyAPI: REST API that provides a number of natural language processing (NLP) and information extraction features DPpedia Spotlight: Automatically links DPpedia resources OpenCalais: NLP API / web service by Thomson Reuters Read more

Topicalizer – an information extraction suite – now open source

Topicalizer is a suite of text analysis and information extraction tools developed by me. It used to be available under http://www.topicalizer.com. However, I unfortunately don't have any time any more to properly maintain it, which is why I'm open-sourcing the code for others to learn from and build upon: https://github.com/BjoernKW/Topicalizer Read more