New
jsoup 1.19.1
Changes
- Added support for http/2 requests in
Jsoup.connect(), when running on Java 11+, via the Java HttpClient implementation. #2257.- In this version of jsoup, the default is to make requests via the HttpUrlConnection implementation: use
System.setProperty("jsoup.useHttpClient", "true");to enable making requests via the HttpClient instead , which will enable http/2 support, if available. This will become the default in a later version of jsoup, so now is a good time to validate it. - If you are repackaging the jsoup jar in your deployment (i.e. creating a shaded- or a fat-jar), make sure to specify that as a Multi-Release JAR.
- If the
HttpClientimpl is not available in your JRE, requests will continue to be made viaHttpURLConnection(inhttp/1.1mode).
- In this version of jsoup, the default is to make requests via the HttpUrlConnection implementation: use
- Updated the minimum Android API Level validation from 10 to 21. As with previous jsoup versions, Android developers need to enable core library desugaring. The minimum Java version remains Java 8. #2173
- Removed previously deprecated class:
org.jsoup.UncheckedIOException(replace withjava.io.UncheckedIOException); moved previously deprecated methodElement Element#forEach(Consumer)tovoid Element#forEach(Consumer()). #2246 - Deprecated the methods
Document#updateMetaCharsetElement(bool)and#Document#updateMetaCharsetElement(), as the setting had no effect. WhenDocument#charset(Charset)is called, the document's meta charset or XML encoding instruction is always set. #2247
Improvements
- When cleaning HTML with a
Safelistthat preserves relative links, theisValid()method will now consider these links valid. Additionally, the enforced attributerel=nofollowwill only be added to external links when configured in the safelist. #2245 - Added
Element#selectStream(String query)andElement#selectStream(Evaluator)methods, that return aStreamof matching elements. Elements are evaluated and returned as they are found, and the stream can be terminated early. #2092 Elementobjects now implementIterable, enabling them to be used in enhanced for loops.- Added support for fragment parsing from a
ReaderviaParser#parseFragmentInput(Reader, Element, String). #1177 - Reintroduced CLI executable examples, in
jsoup-examples.jar. #1702 - Optimized performance of selectors like
#id .class(and other similar descendant queries) by around 4.6x, by better balancing the Ancestor evaluator's cost function in the query planner. #2254 - Removed the legacy parsing rules for
<isindex>tags, which would autovivify aformelement with labels. This is no longer in the spec. - Added
Elements.selectFirst(String cssQuery)andElements.expectFirst(String cssQuery), to select the first matching element from an list.
Bug Fixes
- If an element has an
;in an attribute name, it could not be converted to a W3C DOM element, and so subsequent XPath queries could miss that element. Now, the attribute name is more completely normalized. #2244 - For backwards compatibility, reverted the internal attribute key for doctype names to "name". #2241
- In
Connection, skip cookies that have no name, rather than throwing a validation exception. #2242 - When running on JDK 1.8, the error
java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;could be thrown when callingResponse#body()after parsing from a URL and the buffer size was exceeded. #2250 - For backwards compatibility, allow
nullInputStream inputs toJsoup.parse(InputStream stream, ...), by returning an emptyDocument. #2252 - A
templatetag containing anliwithin an openliwould be parsed incorrectly, as it was not recognized as a "special" tag (which have additional processing rules). Also, added the SVG and MathML namespace tags to the list of special tags. #2258 - A tag containing a within an open would be parsed incorrectly, as the "in button scope" check was not aware of the element. Corrected other instances including MathML and SVG elements, also.