D3.js Wow. Some notes on SVG, D3, and data visualizations

Discussion:

David Adams

9 years ago

Almost a month back I wrote in and asked for some suggestions related to
data visualizations, SVG, and JavaScript. Starting from, well, nowhere I
was very kindly given a stack of excellent suggestions. I've been digging
in and thought I'd pass along a few thoughts and links for others and the
archives. This isn't super well organized and other people will know a lot
more than me about every part of this post...hopefully someone will think
to add some bits and pieces. If I've gotten something wrong, feel free to
offer corrections.

For background, every couple of years I seem to run into something gets me
really excited and fanatical. To those I've abused in the past with my
fanaticism, apologies. To those of you I'm about to abuse...apologies...but
you've been warned. Here's a quick summary: SVG isn't hard, D3 is
mind-bendingly fantastic and, as it turns out, easy to get a lot done with
in a hurry. How does all of this fit in with 4D? That depends on what
you're doing. In my case, I'm using 4D to store, grind, and search a lot of
data and then spit it out in data formats that can be charted.

-------
SVG
-------
First thing's first: SVG. I noticed an email the other day that Keisuke
Miyako is presenting a component of the Portland Summit post training
called "SVG Cookbook For 4D Programmers." I can't find a direct link to the
write-up, but this gets you to the top page, then click "Post-class" and
then click on "Read More" next to the module description:

http://summit.4d.com/us_en/training/

Chances are, this part of the course will be worth the price of admission
on its own.

In my case, I'd never messed around with SVG so I thought that I'd try it
out by hand using a text editor and then building up to some code that
spits out text. I still haven't looked at 4D's SVG commands and,
ultimately, am more likely to use 4D to send out data that a JavaScript
will process into SVG, as needed. I *highly recommend* trying out some SVG
production by hand. Make some overlapping squares or something - try
building a tiny bar chart. Why? Well, you'll see in a big hurry that basic
SVG is easy to read and write and is pretty darn useful. You'll also
probably quickly see that writing tons of SVG by hand is tedious and that
it begs for automation. Some tips:

* With SVG, you declare what you want and a renderer in the browser
interprets your instructions and generated a graphic. Nice.

* SVG is clickable. Actually, that's just the headline - SVG nodes have a
DOM and actions (hover, mouseover, click, etc.) can be attached to nodes.
You can bake the assignments right into the raw SVG or you can assign them
with JavaScript. Either kind of manually/tediously or by using a jQuery/D3
style selection. (D3 has a really sweet approach to selections and method
chaining that make selecting a bunch of objects one line of code and then
iterating over them one more line of code. Making the iteration loop
invisible is pretty nice.)

* Oh yeah, the DOM. With JavaScript you always here about "the DOM." For
the uninitiated, that means the Document Object Model...which is a
meaningless phrase, IMO. It's a tree of nodes in the document that you can
read, write, and modify through JavaScript. An external SVG file has its
own DOM with some slight and subtle difference from the standard HTML DOM.
Most CSS selectors are the same...but not all. These are all the sorts of
details I'll want to avoid...so...if you go down this road, I suggest:

-- Consider using the HTML5 <svg> tag as then the SVG is part of the parent
page's DOM without any fuss.

-- Yeah, single-origin policy restrictions are a pain. For much of this
stuff, you pretty well need to have a Web server running even if you're
working locally. 4D can be a great help here! It's got a nice Web server
built in. You can also use Apache or whatever.

-- Get a library to take care of the tedium....D3....

* If you do use stand-alone SVG, know that you can include CSS and
JavaScript directly in the SVG file itself. This makes the JavaScript and
the SVG nodes all part of the same node tree (DOM) so there's no hassle
there.

* There are a lot of great SVG resources on-line but the one I keep going
back to is here:

http://tutorials.jenkov.com/svg/index.html

Microsoft's pages are also good but don't have that personal touch.

* Lots of SVG code includes an element named <g> for group. It's not a
great name, but it's short...the same (or a similar) concept is found in
any graphics programming environment but with a different name. The idea is
that you get to group elements so that you can operate on them as a block.
Say you've got a square and some text - you can rotate them as a group.
Beyond that, <g> lets you work with a local coordinate space and then
offset it in a block. Eh? Say you're drawing horizontal bars. You can keep
everything oriented to a local y without worrying if you're the first bar
or the fiftieth. Within the object, you can then move the entire block
around to position it vertically. Here's what it looks like to move a block
down 100 units (you can set your own and even mix units, depending on your
needs):

<g transform="translate(0,100)">

At first I didn't appreciate <g> and ignored it. It turns out that when
you're looping over a data source to generate objects, it's pretty darn
handy not to have to keep track of an overall block of object's position in
the final composition. This will probably only make sense if you're trying
to code this stuff, but I toss it out there so that it's in the back of
people's minds. Oh, and going this route can also help avoid some hassles
with alignment that are easy to get when laying out text against other
objects like lines or rectangles.

Tip: Big graph? Put it into a smaller <div> that's set to show scrollbars
when the contents are too big to display completely. Works fine.

-------
D3
-------
D3 (http://d3js.org/) which stands for "Data Driven Documents", is a very
big noise in the data visualization world. To connect things up, it's
generating SVG underneath it all and you can get at the generated SVG
nodes/objects, if you need to. So, it's kind of good to have an idea of how
SVG works underneath. Not necessarily required, but helpful. D3 has had
some big upgrades in the last few years so it's worth another look if you
considered and rejected it years ago.

I also had a look at Raphael (http://raphaeljs.com/) and HighCharts (
http://www.highcharts.com/), both of which look good. Actually, HighCharts
looks awesome but I figured I'd go with the open-source D3...but this is a
hobby project...for a commercial project, it would be worth looking at
HighCharts. Many of you will remember that Stephen Orth pretty much got a
"best in show" award at the 2013 Summit for his High Charts presentation.
For attendees, you can revisit the presentation at:

http://kb.4d.com/assetid=77104

Anyway, I didn't go with Raphael as it may be a dead end now. At the least,
it has nothing close to the momentum of D3. Nothing does, so far as I can
tell. A bunch of thoughts:

* Amazing piece of work. Just amazing.

* There are thousands of examples and a very active community. Here's a
starting point that's not as overwhelming as some:
https://github.com/mbostock/d3/wiki/Gallery

* Try this tutorial:

http://bost.ocks.org/mike/bar/

You'll want a very minimal level of JavaScript knowledge (very basic), a
text editor, and a browser.

* It's easy to install. Not a big deal to some, but dealing with JS library
dependencies is sometimes an enormous pain so I appreciate something that's
easy to install. You copy in a library and link to it from your script. And
now you're done.

* D3 is a platform and tons of libraries have been built on top. I've been
experimenting with dimple (http://dimplejs.org/) which makes it a snap to
produce some complex charts with a few lines of code. As an example, I've
been using 4D to download and grind up data sets related to global bird
distributions at various taxonomic levels. I've got hundreds of thousands
of sums and comparisons stored that I can export as graphable series. One
complex bubble chart graphs this data:

-- The y axis is the overall land area of the location. So, Brazil is way
up top and Belgium is pretty far down.
-- The x axis shows degrees of similarity with a starting point. So, Canada
is very similar to the lower 48 and not very similar to Australia.
-- The bubbles are sized based on total species counts, so Peru gets a huge
bubble and New Zealand gets a small bubble.
-- ...and then each bubble is colored based on ecological region.

How much JavaScript does that all take? About ten lines...and some of those
are decorative. In this case, "decorative" means formatting numbers and
setting axis labels...no "chart junk." So, a few lines of code to get what
Ed Tufte called "high data density."

* Speaking of Saint Tufte, I know that many of the old guard have attended
his trainings or read his books. Well, the D3 guys certainly have. They're
part of the modern information visualization movement that's about
*respecting the data.* I think that the main guy is now at The New York
Times. There are no pointless drop shadows, no three dimensional graphs of
two dimensions of data (Hello! Excel, I'm looking at you) and a real
emphasis on getting the math right for proportionate displays. So, yeah,
this is the real deal. Once you've got the hang of it a bit, it's far
simpler (and better) than charting in, well, Excel.

* A quick word about data. If you're using D3, you can pull in data using a
variety of formats including text, HTML page elements, JSON (or "JSON with
padding"), xml, csv, or tsv (tab-separated values). Some comments:

-- JSON is super nice because then each element is typed correctly.

-- JSON is terrible because the "name": part of each element is repeated
over and over. And over. If you have a big data set, perhaps TSV is a good
plan.

-- Single-origin policy! Grrr. Browser complain. Well, they complain at the
console, in the standard Web window you just don't get what you expect.

-- TSV (etc.) is great because it's so compact.

-- TSV (etc.) is great because you can have numbers with commas and so on.

-- TSV (etc.) is a pain because everything is imported as a string. There's
no schema or meta-data to guide D3 in how to interpret the data. You can do
a post conversion within your JS, like this fragment:

d3.tsv('./bubble_data/locations.tsv', function(data) {
/* Convert strings to numbers. If you use JSON instead of TSV, you don't
need this step. */
data.forEach(function(d){
d['Total Species'] = +d['Total Species'];
d['Shared Species'] = +d['Shared Species'];
d['Shared Percent'] = +d['Shared Percent'];
d['Endemic Percent'] = +d['Endemic Percent'];
d['Square KMs'] = +d['Square KMs'];
})

Is that a good example? I don't know...it might be a terrible example, but
it does show the kind of code to look for. Note that there's a solid
argument saying that this make the JS and the source data a bit to
intimate. Nice to have clean data with type definitions inside of it that
is simply passed through to D3. If you feel this way, use JSON.

-- D3 has a filter() function that lets you extract series from your import
source. So, you can have a shared data set up on the server and then pull
out only the data range or columns you need, for example. (The D3 examples
filter by date but that's not the only option...it's just a good example.)

-------
Tools
-------
Right, you need tools to work with JavaScript. In my case, what bugs me
most is not having a great integrated syntax checker like 4D's. It. Makes.
Me. Uncomfortable. I've tried a bunch of tools and have a bunch more to
try. For now, what's working okay is Microsoft's Visual Studio Code +
ESLint. ESLint is the secret sauce. It's an easy-to-install extension that
highlights errors and warnings. What errors and warnings? That's up to you.
It will flag all sorts of tiny formatting problems (no space after function
name!) if you want it to, but it's deeply configurable. It's worth sitting
down with a few small scripts and messing with the config for 45 minutes
until you've got the level of errors you like. It's still early days for
Visual Studio Code on OS X but it's stable. But no cold folding yet :(

I also like BBEdit but, well, it's just not all that when it comes to some
of this.

Every WebKit based browser on earth has nice developer tools built in now.
Nice. Visual Studio Code has a debugger that installs in Chrome but I
haven't fired it up yet.

There are zillions of other choices - whatever works is fine. I hadn't
realized how much not having something like ESLint was bugging me...figured
someone else might feel the same.

Oh, set up a local Web server to avoid single-origin hassles from file://
references and to get a more realistic outcome prior to deployment. 4D is a
good choice for this, obviously.

-------
Bonus
-------
Hey, this has nothing to do with any of the above, but check this out:

http://thetruesize.com/

Ever hear "The newest US marine reserve is twice the size of Texas!" Okay,
great. How big is Texas? I have no idea. Oh wait, it's half the size of the
new marine reserve. Hmmm. It's impossible to get a grasp of relative size
unless at least one of the things is something you know...and everyone
knows different places. With "The True Size of..." site, you can find a
place you know, an drag its outline over another place using accurate
projects. I have been wanting this very functionality for years and figure
that some other folks here might enjoy it.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ: http://lists.4d.com/faqnug.html
Archive: http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:4D_Tech-***@lists.4D.com
*******************************************

Peter Jakobsson

9 years ago