Unclaimed project
Are you a maintainer of stormcrawler? Claim this project to take control of your public changelog and roadmap.
Claim this projectChangelog
stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
Back to changelogNew
stormcrawler-3.4.0
⚠️ Breaking Change: TextExtractor Renamed and Refactored
Applies to: Users who directly used, extended, or overrode TextExtractor via textextractor.class in crawler.yaml.
What Changed:
-
TextExtractor has been renamed and is now an interface.
-
The default implementation is now called JSoupTextExtractor.
-
If you previously specified TextExtractor via textextractor.class, you must now use the fully qualified name of the new class:
textextractor.class: "org.apache.stormcrawler.parse.JSoupTextExtractor"
or just remove the line as it is the default anyway.
No Action Needed If:
Migration Notes:
What's Changed
- Rel stormcrawler 3.3.0 rc1 by @tballison in https://github.com/apache/stormcrawler/pull/1507
- Bump junit.version from 5.12.0 to 5.12.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1498
- Bump org.apache:apache from 33 to 34 by @dependabot in https://github.com/apache/stormcrawler/pull/1506
- Bump com.microsoft.playwright:playwright from 1.50.0 to 1.51.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1504
- Bump org.apache.solr:solr-solrj from 9.8.0 to 9.8.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1500
- Bump org.mockito:mockito-core from 5.16.0 to 5.16.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1499
- Bump selenium.version from 4.29.0 to 4.30.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1505
- Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1508
- Bump org.apache.maven.plugins:maven-surefire-plugin from 3.5.2 to 3.5.3 by @dependabot in https://github.com/apache/stormcrawler/pull/1509
- #621 Async queries in Solr by @mvolikas in https://github.com/apache/stormcrawler/pull/1488
- Update README and compiler target to Java 17 in several plugins by @rzo1 in https://github.com/apache/stormcrawler/pull/1518
- #1516 - Add config options to change the response buffer size in OpenSearch by @rzo1 in https://github.com/apache/stormcrawler/pull/1517
- Bump de.thetaphi:forbiddenapis from 3.8 to 3.9 by @dependabot in https://github.com/apache/stormcrawler/pull/1513
- Bump org.jacoco:jacoco-maven-plugin from 0.8.12 to 0.8.13 by @dependabot in https://github.com/apache/stormcrawler/pull/1511
- Bump selenium.version from 4.30.0 to 4.31.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1510
- Bump org.mockito:mockito-core from 5.16.1 to 5.17.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1512
- Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1519
- Bump junit.version from 5.12.1 to 5.12.2 by @dependabot in https://github.com/apache/stormcrawler/pull/1520
- Bump com.ibm.icu:icu4j from 76.1 to 77.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1501
- Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1521
- Fixes Update NOTICE File to Reflect 2025 by @rzo1 in https://github.com/apache/stormcrawler/pull/1522
- #1298 - Re-enable hold on failure (on coverage fail) by @rzo1 in https://github.com/apache/stormcrawler/pull/1523
- Bump testcontainers.version from 1.20.6 to 1.21.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1524
- Bump org.jsoup:jsoup from 1.19.1 to 1.20.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1530
New Contributors
- @quangdutran made their first contribution in https://github.com/apache/stormcrawler/pull/1532
Full Changelog: https://github.com/apache/stormcrawler/compare/stormcrawler-3.3.0...stormcrawler-3.4.0
Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1531Bump aws.version from 1.12.782 to 1.12.783 by @dependabot in https://github.com/apache/stormcrawler/pull/1529Bump com.microsoft.playwright:playwright from 1.51.0 to 1.52.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1527Bump selenium.version from 4.31.0 to 4.32.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1526Bump org.wiremock:wiremock from 3.12.1 to 3.13.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1525Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1534Bump org.apache.maven.plugins:maven-archetype-plugin from 3.3.1 to 3.4.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1535Bump org.apache.maven.archetype:archetype-packaging from 3.3.1 to 3.4.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1536Bump org.mockito:mockito-core from 5.17.0 to 5.18.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1540Bump selenium.version from 4.32.0 to 4.33.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1539Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1541Remove Incubating references since we have graduated by @rzo1 in https://github.com/apache/stormcrawler/pull/1538Fix versions of SC in the READMEs + added instructions in RELEASING by @jnioche in https://github.com/apache/stormcrawler/pull/1543#1545 Use same version of URLFrontier as in the module by @jnioche in https://github.com/apache/stormcrawler/pull/1546Bump testcontainers.version from 1.21.0 to 1.21.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1549Bump junit.version from 5.12.2 to 5.13.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1548Bump aws.version from 1.12.783 to 1.12.785 by @dependabot in https://github.com/apache/stormcrawler/pull/1551Bump junit.version from 5.13.0 to 5.13.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1550Bump tika.version from 3.1.0 to 3.2.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1547Bump com.github.ben-manes.caffeine:caffeine from 3.2.0 to 3.2.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1553Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1554#1555 - Storm 2.8.1 by @rzo1 in https://github.com/apache/stormcrawler/pull/1556Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1557Bump aws.version from 1.12.785 to 1.12.787 by @dependabot in https://github.com/apache/stormcrawler/pull/1563Bump org.apache:apache from 34 to 35 by @dependabot in https://github.com/apache/stormcrawler/pull/1562Bump org.wiremock:wiremock from 3.13.0 to 3.13.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1561#1246 - Make ProxyManager to return optional incase no proxy is used by @quangdutran in https://github.com/apache/stormcrawler/pull/1532Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1564Enable GH discussions by @rzo1 in https://github.com/apache/stormcrawler/pull/1565#621 add batching for cloud updates, fix cloud requests by @mvolikas in https://github.com/apache/stormcrawler/pull/1544#1558 - Add a LLM-based TextExtractor (OpenAI API compatible) by @rzo1 in https://github.com/apache/stormcrawler/pull/1559Bump testcontainers.version from 1.21.1 to 1.21.2 by @dependabot in https://github.com/apache/stormcrawler/pull/1568Bump org.codehaus.mojo:license-maven-plugin from 2.5.0 to 2.6.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1567Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1569Bump dev.langchain4j:langchain4j from 1.0.1 to 1.1.0 by @dependabot in https://github.com/apache/stormcrawler/pull/1574Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1575Bump org.jsoup:jsoup from 1.20.1 to 1.21.1 by @dependabot in https://github.com/apache/stormcrawler/pull/1576Regenerated License file after dependency upgrades by @github-actions in https://github.com/apache/stormcrawler/pull/1577