The HTML lang attribute and how to overwrite it

Since I have stopped my current project with the new theme, I am now allowing myself to blog about other topics again. This week, I want to write about an important global HTML attribute: the lang attribute.

Maybe some of you have never (actively) used this attribute, but it’s quite important. Not only does it tell a search engine, what language the content on your website is written it, it also tells an assistive technology, like a screen reader, which voice to use when reading the text. It can even be set inline for a single word or parts of a text.

The main lang attribute however is set on the <html> tag. In WordPress, this is handled by the language_attributes() function, you would usually find in a header.php file in a classic theme. In a block theme, this is handled automatically in Core.

Reasons to overwrite the lang attribute

You would usually not want to change the value of the lang attribute, since WordPress will use the correct one, based on the language setting of your website. But there are cases in which you might want to change it.

Multilingual websites

WordPress can only have one frontend language, unless you install a multilingual plugin. I use MultilingualPress, which is based on multisite. In a WordPress multisite, you can set language per sub-site. This will automatically use the correct lang attribute in each site.

If your website does not use a multilingual plugin, but you have one page with a different language, you could overwrite the lang attribute with some code.

Loading of external code

Another use-case is when you use plugins, that would use the lang attribute to load some external data. I came across a cookie banner plugin this week, which would load the text for the banner from an external resource. It would use the exact value of the lang attribute, but it expects a value like en, so only with two characters. WordPress however is using a value like en-US, which would not work for this cookie banner. So we need to strip the second part of the value.

CSS using the attribute

A good example for a use-case in CSS is the quotes property. Different languages are using different quotation marks. When you want to use the proper quotation marks in a <q> HTML tag, you usually don’t have to do anything, since the browser will handle that for you, as the value is set to quotes: auto. But if you want to overwrite this, you could do the following:

q {
	quotes: "«" "»" "‹" "›";
}

This would always use quotes that are used in French and other languages, even if your lang attribute is set to en.

Some CSS libraries to use the lang attribute to change styles, but they might be doing it like this:

[lang="en"] q {
	/* Some styles */
}

This would not work, if the value is en-US for the lang attribute. There is the CSS :lang() pseudo-class that would work here:

:lang(en) {
	/* Some styles */
}

If you use en here, it would also work for en-US, en-GB, etc. But if you use en-US, it would not work for only en as well.

As the CSS from such a framework might be static, overwriting it might be a bit too complicated, so you might also want to change the value of the global lang attribute of the <html> tag.

How to change the value?

Let’s say, we want to change the value to a static other value for a specific page, you could do something like this:

function my_static_lang_attribute( $output ) {
	$object = get_queried_object();
	if ( $object && str_contains( $object->post_name, 'english' ) ) {
		return 'lang="en-US"';
	}

	return $output;
}

add_filter( 'language_attributes', 'my_static_lang_attribute' );

This would overwrite the lang attribute of any page/post with “english” in the permalink to lang="en-US" for the <html> tag.

As you can see from the function, the filter would not only return the value, but also the attribute name. If you look at the full code of the get_language_attributes function, you can see that the function may return other attributes like dir as well:

function get_language_attributes( $doctype = 'html' ) {
	$attributes = array();

	if ( function_exists( 'is_rtl' ) && is_rtl() ) {
		$attributes[] = 'dir="rtl"';
	}

	$lang = get_bloginfo( 'language' );
	if ( $lang ) {
		if ( 'text/html' === get_option( 'html_type' ) || 'html' === $doctype ) {
			$attributes[] = 'lang="' . esc_attr( $lang ) . '"';
		}

		if ( 'text/html' !== get_option( 'html_type' ) || 'xhtml' === $doctype ) {
			$attributes[] = 'xml:lang="' . esc_attr( $lang ) . '"';
		}
	}

	$output = implode( ' ', $attributes );

	/**
	 * Filters the language attributes for display in the 'html' tag.
	 *
	 * @since 2.5.0
	 * @since 4.3.0 Added the `$doctype` parameter.
	 *
	 * @param string $output A space-separated list of language attributes.
	 * @param string $doctype The type of HTML document (xhtml|html).
	 */
	return apply_filters( 'language_attributes', $output, $doctype );
}

And plugins could also hook into this filter, so overwriting the $output with something static might not work. Unfortunately, there is no filter to change the $lang value only, and hooking into get_bloginfo(), to overwrite the language might break some other places, where this code is used. If you want to strip the second part of the value, you could use some regular expression like this:

function my_dynamic_lang_attribute( $output ) {
	return preg_replace( '/lang="(\w+)([^"]+)"/', 'lang="$1"', $output );
}
add_filter( 'language_attributes', 'my_dynamic_lang_attribute' );

If you need something even more complex, it’s probably best to just overwrite the whole function.

Conclusion

The lang attribute is a very important attribute every website should always set. But the value might not always be, what you need it to be. In those cases, you have a filter you can use to overwrite its value, but always make sure not to return something invalid.

Posted by

Bernhard is a full time web developer who likes to write WordPress plugins in his free time and is an active member of the WP Meetups in Berlin and Potsdam.

Leave a Reply

Your email address will not be published. Required fields are marked *