您的網站有大量內容頁面存檔。這些頁面屬於獨立頁面或沒有被妥善連結。如果您的網站本身每個頁面沒有做好參考關聯,您可以將他們列在 Sitemap 上來確保 Google 不會忽略它們。
您的網站是新網站,當下幾乎沒有外部連結。Google Bot 和其他爬蟲是根據某個頁面的連結來找到另一個頁面的。如果外部沒有任何連結,結果就是 Google 可能根本沒有發現您的頁面。
您的網站使用大量媒體內容,顯示在 Google News 或其他網站相容的 Sitemap 註記
針對大型網站,並非所有頁面都會有參考連結(例如電商網站並不是所有的產品在頁面上都有連結),因此定義 Sitemap 就是需要的。但對於小型到中型網站可能全部頁面都有適當的關聯,從 Google 文件得出的結論並不是必須。
不過經常被提到的是如果有 Sitemap 並且將其提供給搜尋引擎那麼抓取的速度會快一點。也很常聽到在 Google Search Console 提供 Sitemap 是有好處的,您可以比較頁面數和 Google 取得的是否一致。通過這種方式,您可以檢測 Google 是否無法抓取您希望抓取的網站部分。
/** * The console command description. * * @var string */ protected$description = 'Generate the sitemap.';
/** * Execute the console command. * * @return mixed */ publicfunctionhandle() { // modify this to your own needs SitemapGenerator::create(config('app.url')) ->writeToFile(public_path('sitemap.xml')); } }
/* * These options will be passed to GuzzleHttp\Client when it is created. * For in-depth information on all options see the Guzzle docs: * * http://docs.guzzlephp.org/en/stable/request-options.html */ 'guzzle_options' => [
/* * Whether or not cookies are used in a request. */ RequestOptions::COOKIES => true,
/* * The number of seconds to wait while trying to connect to a server. * Use 0 to wait indefinitely. */ RequestOptions::CONNECT_TIMEOUT => 10,
/* * The timeout of the request in seconds. Use 0 to wait indefinitely. */ RequestOptions::TIMEOUT => 10,
/* * Describes the redirect behavior of a request. */ RequestOptions::ALLOW_REDIRECTS => false, ],
/* * The sitemap generator can execute JavaScript on each page so it will * discover links that are generated by your JS scripts. This feature * is powered by headless Chrome. */ 'execute_javascript' => false,
/* * The package will make an educated guess as to where Google Chrome is installed. * You can also manually pass it's location here. */ 'chrome_binary_path' => null,
/* * The sitemap generator uses a CrawlProfile implementation to determine * which urls should be crawled for the sitemap. */ 'crawl_profile' => Profile::class,
return [ ... /* * The sitemap generator uses a CrawlProfile implementation to determine * which urls should be crawled for the sitemap. */ 'crawl_profile' => CustomCrawlProfile::class, ];
SitemapGenerator::create('https://example.com') ->shouldCrawl(function (UriInterface $url) { // All pages will be crawled, except the contact page. // Links present on the contact page won't be added to the // sitemap unless they are present on a crawlable page. return strpos($url->getPath(), '/contact') === false; }) ->writeToFile($sitemapPath);
SitemapGenerator::create('https://example.com') ->getSitemap() // here we add one extra link, but you can add as many as you'd like ->add(Url::create('/extra-page')->setPriority(0.5)) ->writeToFile($sitemapPath);
SitemapGenerator::create('https://example.com') ->getSitemap() // here we add one extra link, but you can add as many as you'd like ->add(Url::create('/extra-page')->setPriority(0.5)->addAlternate('/extra-pagina', 'nl')) ->writeToFile($sitemapPath);