PHP 爬蟲抓取 HTML 內容

# Intro 又是久違的寫爬蟲… 這次是接手大大們的 code 寫的是 PHP 版本研究了一下寫法才發現現在可以不使用第三方套件就可以處理了所以這裡紀錄一下 # 取得 HTML 內容使用 curl 使用 file_get_contents curl 是我常用的方式看了大大們的 code 才知道原來 file_get_contents 也可以取 http/https 內容… 這邊簡單貼一下兩種作法的範例 ## curl function httpGet($url) { $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,true); curl_setopt($ch,CURLOPT_HEADER, [ 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36' ]); $output=curl_exec($ch); curl_close($ch); return $output; } ## file_get_contents function htmlContentGet($url) { $opts = [ "http" => [ "method" => "GET", "header" => "User-Agent: Mozilla/5. ...

2022-04-22

/posts/php_crawler_html/

PHP
Crawler

Tedshd's Dev note

Category: Crawler

PHP 爬蟲抓取 HTML 內容