In computing, HyperText Markup Language (HTML) is a markup language designed for the creation of web pages and other information viewable in a browser. HTML is used to structure information -- denoting certain text as headings, paragraphs, lists and so on -- and can be used to define the semantics of a document.
Originally defined by Tim Berners Lee and further developed by the IETF with a simplified SGML syntax, HTML is now an international standard (ISO/IEC 15445:2000). The HTML specification is maintained by the World Wide Web Consortium (W3C).
In terms of file extensions, HTML documents are frequently named ".HTM", a shortened version implemented in order to get the documents to display properly on DOS/Windows 3.1 systems. This variant conforms with the 8.3 limit on file naming which was a result of the File Allocation Table file system. While unnecessary for modern versions of Windows, the shortened form remains common by convention.
Early versions of HTML were defined with looser syntactical rules which helped its adoption by those unfamiliar with web publishing. Web browsers commonly made assumptions about intent and proceeded with rendering of the page. Over time, the trend in the official standards has been to create an increasingly strict language syntax; however, browsers still continue to render pages that are far from valid HTML. HTML 4.01 is the current version of the HTML specification, although the W3C is moving toward replacing it with XHTML, which applies the stricter rules of XML to HTML.
HTML is a form of markup that is oriented toward the construction of single-page text documents with specialized rendering software called HTML user agents, the most common example of which is a web browser. HTML provides a means by which the document's content can be annotated with various kinds of metadata and rendering hints. The rendering cues may range from minor text decorations, such as specifying that a certain word be underlined or that an image be inserted, to sophisticated imagemaps and form definitions. The metadata may include information about the document's title and author, structural information such as headings, paragraphs, lists, and information that allows the document to be linked to other documents to form a hypertext web.
HTML is a text based format that is designed to be both readable and editable by humans using a text editor. However, writing and updating a large number of pages by hand in this way is time consuming, requires a good knowledge of HTML and can make consistency difficult to maintain. Visual HTML editors such as Macromedia Dreamweaver, Adobe GoLive or Microsoft FrontPage allow the creation of web pages to be treated much like word processor documents. However, the code generated by these programs is frequently of poor quality.
HTML can be generated on the fly using a server-side scripting system such as PHP, JSP or ASP. Many web applications like content management systems, wikis and web forums generate HTML pages.
HTML is also used in email. Many email clients include a GUI HTML editor for composing emails and a rendering engine for displaying them once received. Use of HTML in email is quite controversial due to a variety of issues. The most obvious issues is size, an email with lots of formatting will be much larger than the plain text equivalent. This issue is made even worse by the fact that for compatibility most clients send a plaintext version as well. Other issues are overuse of formatting (there was at one stage a craze for making letterheads using html and sending them as part of every e-mail) and the potential security issues of rendering a complex format like html. For these reasons many mailing lists deliberately block HTML email either stripping out the html part to just leave the plain text part or rejecting the entire message.
There is no official standard HTML 1.0 specification because there were multiple informal HTML standards at the time. However, some people consider the initial edition provided by Tim Berners-Lee to be the definitive HTML 1.0. That version did not include an IMG tag. Work on a successor for HTML, then called 'HTML+', began in late 1993, designed originally to be "A superset of HTML … which will allow a gradual rollover from the previous format of HTML". The first formal specification was therefore given the version number 2.0 in order to distinguish it from these unofficial "standards". Work on HTML+ continued, but this never became a standard.
The HTML 3.0 standard was proposed by the newly formed W3C in March, 1995, and provided many new capabilities such as support for tables, text flow around figures and the display of complex math elements. Even though it was designed to be compatible with HTML 2.0, it was too complex at the time to be implemented, and when the draft expired in September 1995 work in this direction was discontinued due to lack of browser support. HTML 3.1 was never officially proposed, and the next standard proposal was HTML 3.2 (code-named 'Wilbur'), which dropped the majority of the new features in HTML 3.0 and instead adopted many browser-specific elements and attributes which had been created for the Netscape and Mosaic web browsers. Support for math as proposed by HTML 3.0 finally came about years later with a different standard, MathML.
HTML 4.0 likewise adopted many browser-specific elements and attributes, but at the same time began to try to 'clean up' the standard by marking some of them as deprecated, and suggesting they not be used.
Minor editorial revisions to the HTML 4.0 specification were published as HTML 4.01. Due to the advent of XHTML, there will not be any more new versions of HTML. The most common extension for HTML is '.html,' however, previous operating systems limited file extensions to three letters, so a '.htm' extension was also once used, and is less common now, but still interpreted the same and works with most browsers.
Below are the kinds of markup elements in HTML.
In order to specify which version of the HTML standard they conform to, all HTML documents should start with a Document Type Declaration (informally, a "DOCTYPE"), which makes reference to a Document Type Definition (DTD). For example:
This defines a document that conforms to the Strict DTD of HTML 4.01, which is purely structural, leaving formatting to Cascading Style Sheets. In some cases, the presence or absence of an appropriate DTD may influence how a web browser will display the page.
In addition to the Strict DTD, HTML 4.01 provided the Transitional and Frameset DTDs. The Transitional DTD was intended to gradually phase in the changes made in the Strict DTD, while the Frameset DTD was intended for those documents which contained frames.
Efforts of the web development community have led to a new thinking in the way a web document should be written; XHTML epitomizes this effort. Standards stress using markup which suggests the structure of the document, like headings, paragraphs, block quoted text, and tables, instead of using markup which is written for visual purposes only, like <font>, <b> (bold), and <i> (italics). Some of these elements are not permitted in certain varieties of HTML, like HTML 4.01 Strict. CSS provides a way to separate the HTML structure from the content's presentation, by keeping all code dealing with presentation defined in a CSS file. See separation of style and content.