The author

The 100 Year Web

Steven Pemberton, CWI, Amsterdam

Contents

About me

Dick GrimsdaleTutored at university by Dick Grimsdale, who built the world's first transistorised computer. He was himself a tutee of Alan Turing.

My second real job was coincidentally at Turing's old department, working on software for computer number 5 in the series that he had worked on.

Co-designed programming language that Python is based on.

Fifth person on European Open Internet

Organised two workshops at the first Web conference in 1994.

Which led to W3C: CSS, HTML, XHTML, RDFa, XForms, ...

A typical project meeting

A project meeting

Discussing HTML

A snowy Boston

New College, Oxford, 1379

New College Dining Hall
As reported by Stewart Brand in How Buildings Learn

Built in 1379, new college had a dining hall with huge oak beams in the roof.

Eventually the beams needed replacing. But where do you find oak beams?

They approached the University forester, and asked him.

"Which college are you from?" "New College." "Well, I've got your trees".

It turns out that around the time that New College was built, they planted new trees to be ready for when they would need them.

We don't see that sort of attitude much these days.

2018

Sixty years since the start of public computing

Fifty years since the introduction of the programming language Algol 68

Thirty years since the open internet became truly international

Twenty years of XML

XML

Good stuff.

No second-system effect.

Admittedly some small mistakes, but basically solid good design.

Document model.

Excellent tool chain.

Modularisation.

XHTML+SVG+MathML in the browser (2002)

https://www.w3.org/TR/XHTMLplusMathMLplusSVG/

XHTML+SVG+MathML

The New Web, by programmers, for programmers

HTML5 has changed the Web.

Some parts are good, but mostly it is based on a lack of proper design, and a lack of understanding of design principles and how to design notation.

I don't believe that HTML5 can lead the Web to its full potential.

Declarative

A Declarative Definition

We learn in school what numbers are, and how to add, subtract, multiply and divide.

However, when we get to square roots, we are only told:

The square root of a number n is the number r such that r × r = n.

This is a declarative definition. It tells you what something is, it tells you how to recognise it, but it doesn't tell you how to calculate it.

Most people know what a square root is, few people leave school knowing how to calculate one.

A Procedural Definition

So take a look at a procedural definition of square root:

function f a:
{
   x ← a
   x' ← (a + 1) ÷ 2
   eps ← 1.19209290e-07
   while abs(x − x') > eps × x: 
   {
      x ← x'
      x' ← ((a ÷ x') + x') ÷ 2
   }
   return x'
}

Advantages of the Declarative Approach

  1. (Much) Shorter
  2. Easier to understand
  3. Independent of implementation
  4. Less likely to contain errors
  5. Easier to see it is correct
  6. Tractable

Declarative numbers

number: optional sign, digit+.
optional sign: "-"?.
digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9".
A number has its normal everyday meaning.

This is:

Procedural numbers: HTML5

2.4.4 Numbers

2.4.4.1 Signed integers

A string is a valid integer if it consists of one or more ASCII digits, optionally prefixed with a "-" (U+002D) character.

A valid integer without a "-" (U+002D) prefix represents the number that is represented in base ten by that string of digits. A valid integer with a "-" (U+002D) prefix represents the number represented in base ten by the string of digits that follows the U+002D HYPHEN-MINUS, subtracted from zero.

The rules for parsing integers are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will return either an integer or an error.

  1. Let input be the string being parsed.
  2. Let position be a pointer into input, initially pointing at the start of the string.
  3. Let sign have the value "positive".
  4. Skip whitespace.
  5. If position is past the end of input, return an error.
  6. If the character indicated by position (the first character) is a "-" (U+002D) character:
    1. Let sign be "negative".
    2. Advance position to the next character.
    3. If position is past the end of input, return an error.

    Otherwise, if the character indicated by position (the first character) is a "+" (U+002B) character:

    1. Advance position to the next character. (The "+" is ignored, but it is not conforming.)
    2. If position is past the end of input, return an error.
  7. If the character indicated by position is not an ASCII digit, then return an error.
  8. Collect a sequence of characters that are ASCII digits, and interpret the resulting sequence as a base-ten integer. Let value be that integer.
  9. If sign is "positive", return value, otherwise return the result of subtracting value from zero.

2.4.4.2 Non-negative integers

A string is a valid non-negative integer if it consists of one or more ASCII digits.

A valid non-negative integer represents the number that is represented in base ten by that string of digits.

The rules for parsing non-negative integers are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will return either zero, a positive integer, or an error.

  1. Let input be the string being parsed.
  2. Let value be the result of parsing input using the rules for parsing integers.
  3. If value is an error, return an error.
  4. If value is less than zero, return an error.
  5. Return value.

Inflation

So the HTML5 definition of signed numbers is 16 times longer and has internal inconsistencies.

It will not surprise you to learn as a result that the HTML5 spec is very large.

HTML5, the Spec

The HTML5 Spec printed

HTML5 is almost, but not quite, entirely not about markup

You can tell by reading the HTML5 spec that it was written by programmers.

"When your only tool is a hammer, all your problems look like nails"

And yet, they're programmers and they forgot about how you use libraries in programs? The HTML5 spec is one huge monolithic program.

Declarative Markup

Declarative methods are not only for specification.

HTML used to be all about being declarative.

The poster-child of HTML declarative markup is the <a> element:

<a href="talk.html" title="..." target="..." class="..." >My Talk</a>

This compactly encapsulates a lot of behaviour including

Doing this in programming would be a lot of work.

CSS

CSS is another example of a successful declarative approach.

When W3C started the CSS activity, Netscape, at the time the leading browser, declined to join, saying that they had a better solution, JSSS, based on Javascript.

Instead of

h1 { font-size: 20pt; }

you could use script to say

document.tags.H1.fontSize = "20pt";

Wikipedia:

"JSSS lacked the various CSS selector features, supporting only simple tag name, class and id selectors. On the other hand, since it is written using a complete programming language, stylesheets can include highly complex dynamic calculations and conditional processing."

Which brings us to Javascript.

Javascript

Javascript- the definitive guide


Javascript: So Good it has Good Parts!

Javascript, the good parts book

Javascript

The good parts vs the rest

Javascript: Where even the Good Parts have Bad Parts

"Javascript: the Good Parts" is peppered with text such as this:

"If the operand [of typeof] is an array or null, then the result is 'object', which is wrong"

and

"The mechanism that Javascript provides to [make a new object] is messy and complex, but it can be significantly simplified"

and

"The best thing about Javascript is its implementation of functions. It got almost everything right. But, as you should expect with Javascript, it didn't get everything right"

So apparently even some of the good parts are bad.

Javascript Debugging

Javascript debugging is hard, because of misuse of the Robustness Principle.

The Robustness Principle (also known as Postel's Law) states:

"Be conservative in what you do,
be liberal in what you accept from others"

Although it has its uses, not everyone thinks this is necessarily a good rule. I don't, and I'm not the only one.

I personally think this principal has had a bad effect on the web, because if thanks to being liberal, browsers accept all sorts of junk, then authors don't know it is wrong, and then they can't easily be conservative in what they produce.

It produces "suck it and see" coding. Once it looks OK in the browser, you stop, thus increasing the amount of junk on the web.

This is bad, because the message you think you are sending is not the message being received, but you don't know it. It can eat hours of your time trying to find out why your CSS doesn't work, only to discover it's because the browser thinks the HTML is different to what you think it is.

Robustness Principle

The Robustness Principle was proposed in order to improve interoperability between programs, which has been expressed by one wag as:

"you’ll have to interoperate with the implementations of others with lower standards. Do you really want to deal with those fools? Better to silently fix up their mistakes and move on."

However, for the reasons mentioned earlier, the Robustness Principle should never be applied to programming languages!

Unfortunately it has been to Javascript. Did you know that

++[[]][+[]]+[+[]] evaluates to the string "10" ?

Javascript is very hard to debug, since it silently accepts certain classes of errors, that then don't show up until much later.

Studies have shown that 90% of the cost of software comes from debugging. Reducing the need for debugging is really important.

Parsing JSON is a Minefield

"I'll show that JSON is not the easy, idealised format as many do believe. Indeed, I did not find two libraries that exhibit the very same behaviour. Moreover, I found that edge cases and maliciously crafted payloads can cause bugs, crashes and denial of services, mainly because JSON libraries rely on specifications that have evolved over time and that left many details loosely specified or not specified at all."

http://seriot.ch/parsing_json.php

Programming

One of the problems with using programming as the basis of functionality is that standardisation flies out of the window.

Example: CSS presentation mode (which I am using here). This allows you to specify how any document can be formatted when doing a presentation.

Alas, HTML5 has taken the approach that you can do this better in Javascript. No one supports Presentation Mode any more. And there are now lots of Javascript packages to do presentation.

ALL DIFFERENT!

You can no longer switch in a different presentation package, and use that, because you have to CHANGE THE DOCUMENT.

The programmers are doing the document design, so all the documents become proprietary, and there is no interoperability, which is the whole point of standards.

Elements

This is why there are so few new elements in HTML5: they haven't done any design, and instead said "if you need anything, you can always do it in Javascript".

And they all have.

And they are all different.

Flavours of Javascript

"What flavor of Javascript are you going to use? Are you gonna use a transpiler? From what language? Grunt? Gulp? Bower? Yeoman? Browserify? Webpack? Babel? Common.js? Amd? Angular? Ember? Linting? What am I talking about? Am I mixing things up? Am I confused? "

"Talking to the community about my “analysis paralysis loop” caused by the excessive amount of available tools to choose from and to investigate resulted in the community suggesting to try out, spend time, learn and investigate four more technologies that I haven’t even considered in the first place. Good job, Javascript!"

Pistaccio

Frameworks

So which framework do you use?

Raw bare-metal Javascript? Angular? Dojo? Bootstrap? Or one of the other 26 listed on Wikipedia?

Are they compatible? No.

What happens when:

YOU HAVE TO CHANGE ALL YOUR DOCUMENTS.

This is why we need standards, not proprietary formats like frameworks.

Example incompatibilities

document.getElementById('test-table');

dojo.byId('test-table');

$('test-table')

delete Ext.elCache['test-table']; 
Ext.get('test-table');

$jq('#test-table');

YAHOO.util.Dom.get('test-table');

document.id('test-table');

Brittleness: the left-pad disaster

npm is a website that helps keep track of Framework packages and dependencies.

Very popular: more than a billion downloads per week.

One guy had a package on npm called kik.

The makers of another program called kik wanted to use npm (which apparently has a single global namespace, duh).

They owned the trademark for kik, and so 'asked' him to change the name of his package. He declined the offer.

Kik the company went to npm the company, and complained; npm removed the guy's kik.

He got cross, and removed all his (250+) packages, and took them elsewhere.

Which broke the web.

How one developer just broke Node, Babel and thousands of projects in 11 lines of JavaScript

Brittleness

But that's not all:

"The author unpublished over 250 NPM modules, making those global names (e.g. "map", "alert", "iframe", "subscription", etc) available for anyone to register and replace with any code they wish.

Since these libs are now baked into various package.json configuration files (some with 10s of thousands of installs per month, "left-pad" with 2.5M/month), meaning a malicious actor could publish a new patch version bump (for every major and minor version combination) of these libs and ship whatever they want to future npm builds."

callmevlad

Availability

"The U.K.’s GDS (Government Digital Service) ran an experiment to determine how many of its users did not receive JavaScript-based enhancements, and it discovered that number to be 1.1 percent, or 1 in every 93 users. For an ecommerce site like Amazon, that’s 1.75 million people a month, which is a huge number."

alistapart

Bloat

To look at the webpage of one single tweet of 140 characters, you have to download just under a megabyte. It's 5200 lines of HTML before you even get to the five Javascript packages.

The whole of James Joyce's Ulysses is only half as long again.

"An article from 2012 titled "The Growing Epidemic of Page Bloat" warns that the average web page is over a megabyte in size.

The article itself is 1.8 megabytes long.

An article two years later, called “The Overweight Web" warns that average page size is approaching 2 megabytes.

That article is 3 megabytes long.

If present trends continue, there is the real chance that articles warning about page bloat could exceed 5 megabytes by 2020."

The Website Obesity Crisis

Speed

"Because of #GDPR, USA Today decided to run a separate version of their website for EU users, which has all the tracking scripts and ads removed. The site seemed very fast, so I did a performance audit. How fast the internet could be without all the junk!

5.2MB → 500KB

They went from a load time of more than 45 seconds to 3 seconds, from 124 (!) JavaScript files to 0, and from a total of more than 500 requests to 34."

Marcel Freinbichler

Accessibility

"Many developers who have grown up only using frameworks have a total lack of understanding about the fundamentals of HTML, such as valid and semantic markup ... This is of great concern as semantic markup is one of the core principles of an accessible web."

Russ Weakley

Programming

"You know... I feel like I blinked and then all of the sudden what I thought was my job was suddenly not my job but now I'm being told that I need to do this other stuff that I don't even like and people wonder why I'm wielding a stiletto like a weapon and screaming, "I HATE JAVASCRIPT! YOU CAN'T MAKE ME! NO MEANS NO!" and considering a second career in comedy writing."

Nicole Henninger

Example

A fake dropdown

<div class="dropdown">
   <button id="dropdownMenu1" data-toggle="dropdown"
              aria-haspopup="true" aria-expanded="true">
      Dropdown
      <span class="carat"?</span>
    </button>
    <ul class="dropdown-menu" aria-labelledby="dropdownMenu1">
        <li><a href="#">Action</a></li>
        <li><a href="#">Another</a></li>
        <li><a href="#">Something</a></li>
        <li><a href="#">Separated</a></li>
    </ul>
</div>

Design...

And then there are the design techniques they used when they did do some design.

"Paving the cowpaths"

This is a design-principle based on architecture: when you build a campus or estate, don't pave the paths, but wait and see where people walk, so you can see where they need paths.

A desire path at the CWI

Desire paths

The HTML5 design principles document got it wrong:

"When a practice is already widespread among authors, consider adopting it rather than forbidding it or inventing something new.

Authors already use the <br/> syntax as opposed to <br> in HTML and there is no harm done by allowing that to be used."

Paving the cowpaths would be more like noticing that huge numbers of sites have a navigation drop-down, and supporting that natively.

Cows are not designers

Cowpaths are data.

If you pave cowpaths, you are setting in stone the behaviours caused by the design decisions in the past.

Cowpaths tell you where the cows want to go, not how they want to get there. If they have to take a path round a swamp to get to the meadow, then maybe it would be a better idea to drain the swamp, or build a bridge over it, rather than paving the path they take round it.

Paving cowpaths is a bad design principle in the way that they applied it. It can be a good design principle, but they apparently misunderstood it.

One example of bad cowpath-based design

The HTML5 group spidered millions of pages, because they could, and then on the basis of that data decided what should be excluded from HTML5.

This is not "paving the cowpaths"! It is exactly the opposite: it is putting fences across cowpaths that are used by fewer cows than some other paths, and even goes against their own proclaimed design principles!

For instance: @rev.

<link rel="next" href="chap2.html"/>
<link rev="prev" href="chap2.html"/>

@rel and @rev are complementary attributes, they are a pair, like +/-, up/down, left/right.

The HTML5 people decided that not enough people were using @rev, and so removed it.

  1. This breaks backwards compatibility.
  2. It prevents those who did use it from using it.

Irritated by Colon Disease

For years, the wider community had agreed to use a colon (:) to separate a name from the identification of where it came from. A colon was a legal name character, and it was chosen to be backwards compatible, but in some environments could be interpreted in a certain way.

eg: xml:lang

But no, they had to develop a new separator, the hyphen.

eg:

<div role="searchbox"
     aria-labelledby="label" 
     aria-placeholder="MM-DD-YYYY">03-14-1879</div>

The people who so disdained namespaces just went and invented them again.

Not Invented Here Syndrome

"Four social dynamics appear to underlie NIH:

  1. Belief that internal capabilities are superior to external ones.
  2. Fear of of losing control.
  3. Desire for credit and status.
  4. Significant emotional and financial investment in internal initiatives."

    Lidwell et al., Universal Principles of Design

Not Invented Here Syndrome

"The amount of “not invented here” mentality that [pervades] the modern HTML5 spec is odious. Accessibility in HTML5 isn’t being decided by experts. Process, when challenged through W3C guidelines, is defended as being “not like the old ways”, in essence slapping the W3C in the face. Ian’s made it clear he won’t play by the rules. When well-meaning experts carefully announce their opposing positions and desire for some form of closing the gaps, Ian and the inner circle constantly express how they don’t understand."

http://cssquirrel.com/blog/2009/08/03/behold-leviathan-confused/

Many groups had already solved problems that HTML5 could have used, but HTML5 decided to reinvent (usually with worse results, since they were for areas that they were not experts in).

Example NIH: RDFa

The question was: How should you represent general metadata in HTML?

2003: Cross working group task force created of interested parties.

2004: First working draft of RDFa

2008: RDFa Recommendation

So RDFa represented more than 5 years of work, consensus, and agreement on how metadata should be represented in HTML and other technologies.

2009: HTML5 creates microdata out of the blue:

FUD ensues.

2013: Microdata abandoned.

Forward compatibility: Empty elements

If XML did one thing right, it introduced a new notation for empty elements:

<br/>

This one simple change meant that you could parse a document without a DTD or Schema; you could parse any document without knowledge of the elements involved, which made the parser forward-compatible.

HTML5 dropped the requirement of using this notation (probably because of Irritated by Colon Disease), meaning that they can now never add a new empty element without breaking something.

Quotes

Apparently the XHTML rules were too restrictive, having to enclose every attribute with quotes. And yet:

"Even with these simplified definitions, it’s still a pain to remember all the rules for unquoted attribute values, especially as they differ between HTML and CSS. When in doubt, it’s probably best to just use quotes. If you’re confused, it’s likely to confuse your colleagues too. If you’re using user input in an attribute value, always quote (and escape) it to prevent XSS security vulnerabilities."

https://mathiasbynens.be/notes/unquoted-attribute-values

Or as another wag put it:

"You know what would be cool? JSON, but invented by a less obsessive personality -- optional quotes on properties, trailing commas, and comments." @marijnjh

Show source

Well, this has always been a problem, but it seems to be getting worse and worse. This is just a page selected at random. There are far worse examples. There is no 'document' anymore.

<div class="row">            <div class="span4 col pull-right  bTB1S  padTB30">
                                        <div class="padT0 marB10">
                                            <h4 class="franklin-bold size-one-twenty-pc marT0">‘Basically owned the technology in cryptography'</h4>
                                        <div class="franklin-light size-fourteen lh17em">
                                            </div>
                </div>
                                            <div class="video-image-wrapper">
                <div class="video-container not-lead" class="playing">
          <div class="video right-rail-video" style="height: auto; width: auto;" id="wp_69ebfb98-049e-11e5-93f4-f24d4af7f97d" data-show-endscreen="0" data-autoplay="0" data-video-uuid="69ebfb98-049e-11e5-93f4-f24d4af7f97d" data-companion-ad="0">
        <div class="innerWrapper" style="width: 100%;"></div>
        </div>
        </div>
                        <div class="image-button-wrapper">
                                            <div class="spinner">
                            <img class="image" data-right="true" data-original="http://www.obfust.com/sf/wp-content/themes/wapo-blogs/inc/imrs.php?src=http://s3.amazonaws.com/posttv-thumbnails-prod/thumbnails/55660d34e4b0ba0b9fd407d8/CROCKER1.jpg&authapi-mob-redir=0" src="http://www.obfust.com/sf/wp-content/themes/wapo-blogs/inc/imrs.php?src=http://s3.amazonaws.com/posttv-thumbnails-prod/thumbnails/55660d34e4b0ba0b9fd407d8/CROCKER1.jpg&w=1080&authapi-mob-redir=0" />
                        </div>
                                        <div class="imm-video-overlay">
                        <img data-video-id="wp_69ebfb98-049e-11e5-93f4-f24d4af7f97d" class="wp-loading imm-loading-btn-small" src="http://www.obfust.com/posttv/resources/img/loading_wp_white_100.gif" style="display: none;" alt="loading" />
                        <div class="imm-video-play-btn imm-video-play-btn-small">
                            <i class="fa fa-play"></i>
                            <span class="franklin-bold">Play Video</span>
                        </div>
                    </div>
                </div>
                    </div>
                  
                                                    <div class="marT10">
                                            <p class="marTB0 light-grey size-fourteen franklin-light">
                                                    Steve Crocker, worked on early networking technology for DARPA.                                                                        </p>>

HTML5 isn't about markup

It's only about the DOM.

The old DOM group got closed long ago; it shouldn't have been.

The Web has been ruined, by putting it in the hands of people who don't know how to design markup.

But now

The last W3C XML group is closed.

Liam is leaving W3C.

W3C has abandoned the Declarative Web

What does the Web need?

Modularity

Extensibility

Accessibility

Declarative

100 year web

The web is the way now that we distribute information. We will need the web pages we create now to be readable in 100 years time, just as we can still read 100-year-old books.

Requiring a webpage to depend on a particular 100-year-old implementation of Javascript is not exactly evidence of future-thinking.

At least declarative markup is easier to keep alive because (see the 5th slide) it is INDEPENDENT OF THE IMPLEMENTATION!

A Call to Action!

It is time for a new movement, to lead the Web to its full potential.

We should seize back the Declarative Web.

As XForms and other markups have shown, we can still create meaningful declarative documents, and serve them to HTML browsers.

HTML5 can become the assembly language of the web, and we can go back to having a coherent, declarative, author-friendly web.

A New Organisation

A new home for our work.

A place where people understand what we are doing, and offer moral support.

A place to develop new specifications

A home for declarative.

What can we do?

Technically, such things as:

Socially:

What is needed?

100 Year Web

All this can't happen overnight.

But it needs to happen.

We need to seize the initiative, and recreate a declarative, robust, lasting web.

Let HTML5 play; HTML5 is now the assembly language of the web.

It is time for some Higher-Level Languages!

Who's in?