Cleaning Up Sneaky JavaScript

Software can phone home. A lot of software “forgets” some Javascript and fonts on HTML pages inside the desktop program to do this. These are some notes about cleaning up Zeal, an “offline” documentation browser.

As more and more software is actually just a browser, this problem is common. A lot of popular software isĀ  actually running a browser in your local, desktop program, for example by using Electron or QT embedded browser.
You can detect phoning home with network sniffing (e.g. Wireshark), using a logging proxy, containerization logging (e.g. AppArmor, Docker, SELinux) and looking at source code, if available.
These are brief notes how to clean Dash datasets that are offered by Zeal documentation browser and how to configure Zeal not to connect to the Internet. Sadly, this is need for the version of Zeal available in Ubuntu default repositories.
I definitely did not expect Zeal “offline” documentation browser to send information about my work on my computer to many third parties. But it did: www.google-analytics.com, srv.carbonads.net, www.seedanddew.com… Looking at Zeal public buglist, this comes as no surprise to its developers. But it did come as a surprise for me, and I feel Ubuntu should add a warning when launching the program.
The telemetry code is built inside docsets, which are both offered by the program GUI for download and required for using the program.

Bypassing the Problem with Proxy and Cleanup

To limit the damage, set Edit: Preferences: Network: Proxy to “Manual proxy configuration”. HTTP proxy: localhost. Port: 9876. (Any closed port will do).
To reduce log spam to proxy/wireshark/intrusion prevention system/containerization logging, the datasets can be cleaned. This cleaning does not prevent information leaks, it prevents unnecessary log lines to software that watches for and prevents these connections.
This cleaning is manual work, this worked for me.
Downloaded docsets are stored in $HOME/.local/share/Zeal/Zeal/docsets.

$ find docsets/ -type f -iname '*.htm*' -print -exec perl -p -i -e 's/<sCript/<!-- disabled-script /i' '{}' \;
$ find docsets/ -type f -iname '*.htm*' -print -exec perl -p -i -e 's/\/sCript>/disabled-script -->/i' '{}' \;
$ find docsets/ -type f -iname '*.htm*' -print -exec perl -p -i -e 's/<nosCript/<!-- disabled-noscript /i' '{}' \;
$ find docsets/ -type f -iname '*.htm*' -print -exec perl -p -i -e 's/\/nosCript>/disabled-noscript -->/i' '{}' \;
$ find docsets/ -type f -iname '*.css' -print -exec perl -p -i -e 's|https://fonts.googleapis.com/|https://fonts.googleapis.com.invalid/|i' '{}' \;
$ find docsets/ -type f -iname '*.less' -print -exec perl -p -i -e 's|https://fonts.googleapis.com/|https://fonts.googleapis.com.invalid/|i' '{}' \;
$ find docsets/ -type f -iname '*.js' -exec rm -v '{}' \;
$ find docsets/ -type f -iname '*.css' -print -exec perl -p -i -e 's|(\@import url\(http.*\);)|/* disabled \1 */|i' '{}' \;

How could it be fixed

As a first step, Ubuntu should add a warning that this program logs your “offline” documentation reading habits and sends it to third parties.
Secondly, Zeal (or a version of Zeal) should disable all outgoing network connections that are not specifically requested by the user. Pages browsed should never be allowed to connect to the Internet. Offline documentation browser should never send my data to www.google-analytics.com, srv.carbonads.net or www.seedanddew.com.
Updated wording, tags and cleaned the find-perl cleaning script.

Posted in Uncategorized | Tagged , , , , , , , | Comments Off on Cleaning Up Sneaky JavaScript

Comments are closed.