因日本輻射資料有 csv 、 xml 格式,先前已針對 csv 作了匯入工具,現在又拿到了一份
xml格式的資料,為了延續 csv 軟體的便利性,開發了一支簡單的 xml 轉 csv 工具。
假設 xml 內容如下:
- <?xml version="1.0" encoding="utf-8"?>
- <project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" lang="ja" xsi:noNamespaceSchemaLocation="standardizedXML_1.0.0_rc1.xsd">
- <projectId>1</projectId>
- <title>第1次走行サーベイによる空間線量率の測定結果(H23.6.6~H23.6.13)</title>
- <conductors>文部科学省</conductors>
- <sites>
- <site id="17_5539_3274_69">
- <meshID>5539_3274_69</meshID>
- <location>
- <geometry>
- <area>
- <coordinates>139.311250,36.980833 139.312500,36.980833 139.312500,36.980000 139.311250,36.980000 139.311250,36.980833</coordinates>
- </area>
- </geometry>
- <representativePoint>
- <latitude>36.980416</latitude>
- <longitude>139.311875</longitude>
- <latitude_60>36°58'49.5"</latitude_60>
- <longitude_60>139°18'42.8"</longitude_60>
- <distance unit="km">160.1</distance>
- </representativePoint>
- <cityCode>07364</cityCode>
- <prefecture>福島県</prefecture>
- <cityName>南会津郡桧枝岐村</cityName>
- </location>
- <occasion>
- <date>
- <date>2011-06-06</date>
- <timeZone>+09:00</timeZone>
- </date>
- </occasion>
- <representativeData>
- <doseMeasurements>
- <doseMeasurement>
- <doserates>
- <doserateInformation>
- <doserate>
- <data>
- <unit>μSv/h</unit>
- <statistics>
- <n>2</n>
- <mean>1.0E-01</mean>
- </statistics>
- </data>
- <isEffectiveDose>false</isEffectiveDose>
- <isEquivalentDose>false</isEquivalentDose>
- </doserate>
- </doserateInformation>
- </doserates>
- </doseMeasurement>
- </doseMeasurements>
- </representativeData>
- </site>
- <site id="17_5539_3274_77">
- <meshID>5539_3274_77</meshID>
- <location>
- <geometry>
- <area>
- <coordinates>139.308750,36.981667 139.310000,36.981667 139.310000,36.980833 139.308750,36.980833 139.308750,36.981667</coordinates>
- </area>
- </geometry>
- <representativePoint>
- <latitude>36.981250</latitude>
- <longitude>139.309375</longitude>
- <latitude_60>36°58'52.5"</latitude_60>
- <longitude_60>139°18'33.8"</longitude_60>
- <distance unit="km">160.3</distance>
- </representativePoint>
- <cityCode>07364</cityCode>
- <prefecture>福島県</prefecture>
- <cityName>南会津郡桧枝岐村</cityName>
- </location>
- <occasion>
- <date>
- <date>2011-06-06</date>
- <timeZone>+09:00</timeZone>
- </date>
- </occasion>
- <representativeData>
- <doseMeasurements>
- <doseMeasurement>
- <doserates>
- <doserateInformation>
- <doserate>
- <data>
- <unit>μSv/h</unit>
- <statistics>
- <n>2</n>
- <mean>1.0E-01</mean>
- </statistics>
- </data>
- <isEffectiveDose>false</isEffectiveDose>
- <isEquivalentDose>false</isEquivalentDose>
- </doserate>
- </doserateInformation>
- </doserates>
- </doseMeasurement>
- </doseMeasurements>
- </representativeData>
- </site>
利用正歸表示式,找出所有 tag 使用次數,然後排序,找多最多重履性的tag
如 unit、n、timeZone、coordinates、lat....、lon、.....
將些這資料重新組成新的 csv column
於是就作出了一支小巧易用的小程式:
- <?php
- $program_name=basename(__FILE__);
- $HELP_MESSAGE="
- XML to CSV.
-
- Example:
-
- php {$program_name} [input xml file] [output csv file]
-
- ";
- if($argc!=3){
- echo $HELP_MESSAGE;
- exit();
- }
- $data = file_get_contents($argv[1]);
- $csv_output=$argv[2];
- $data = str_replace('"','”',$data);
- $data = str_replace(',',',',$data);
- preg_match_all('/<(.*)>(.*)<\/(.*)>/',$data,$all);
-
- $all_tags_counter = array_count_values ( $all[1] );
-
- $csv_columns_counter = array_count_values($all_tags_counter);
- arsort($csv_columns_counter);
- $base_c = key($csv_columns_counter);
- $header=ARRAY();
- $headers=ARRAY();
- foreach($all_tags_counter as $k=>$c){
- if($c == $base_c){
-
- $title = str_replace(ARRAY('"',"'"),'',$k);
- array_push($header,$title);
- $k_start = $k;
- $mk = explode(" ",$k);
- $k_end = current($mk);
- $rules = '<'.$k_start.'>(.*)<\/'.$k_end.'>';
- preg_match_all('/<'.addslashes($k_start).'>(.*)<\/'.$k_end.'>/',$data,$headers[$k]);
- }
- }
- file_put_contents($csv_output,implode(',',$header)."\n");
- $tmp="";
- for($i=0;$i<$base_c;$i++){
- $d = ARRAY();
- foreach($header as $k){
-
- $contents = $headers[$k][1][$i];
- array_push($d,$contents);
- }
- $tmp.=implode(",",$d)."\n";
- if($i%1000==0){
- file_put_contents($csv_output,$tmp,FILE_APPEND);
- $tmp="";
- echo "Process... {$i} /{$base_c} \n";
- }
- }
- if($tmp!=""){
- file_put_contents($csv_output,$tmp,FILE_APPEND);
- $tmp="";
- echo "Process... {$i} /{$base_c} \n";
- }
- file_put_contents($csv_output,trim(file_get_contents($csv_output)));
- echo "Done...\n";
使用方式如下:
一個 70MB 的 xml ,約只要5秒鐘就可以轉成 csv 了
|