Help splitting up a document file with product descriptions in it...
I have got a .doc file with english and chinese text in it, they are descriptions for products.. they are split apart in the doc by numbers i.e. 0001,0002,0003,0004,0005 etc etc
For example..
0001
技术参数
电压:AC90V-120V/220V-240V 50-60HZ
功率:400W
光源:120PCS 1W/3W LEDS
(R:30pcs,G:30pcs,B:30psc,W:30pcs)
控制通道:12通道
运行模式:主从,自走,声控,DMX512
每颗LED的理论寿命为50000-100000时
光学透镜角度标准15度
水平扫描:540度,垂直扫描270度
可以调节扫描速度
无限的RGBW颜色混色系统
显示操作面板彩用LCD显示屏
产品尺寸:515*402*555mm
净重:19kg 毛重:21kg
TECHNICAL PARAMETER
Voltage: AC90V-120V or 200V-240V 50-60HZ
Power consumption:400W
Light source:120PCS 1W or 3W LED
(R:30pcs,G:30pcs,B:30psc,W:30pcs)
Control mode:12HS
Operation mode: master-slave, auto movement,
Sound control: DMX 512
Each led source has an expectancy over 50000 to 100000 hours in theory
Optical len angle:15 degrees
Level scanning:540 degrees Vertical scanning
270 degrees, speed adjustable
Indefinite RGBW col开发者_开发技巧or mixing system
LCD display adopted
Product size:512*402*555mm
N.W:19kg G.W:21kg
0002
技术参数
电压:AC100V-240V,50/60HZ
功率:360W
光源:108颗 1/3W LED
运行模式:主从,自走,声控,DMX512
控制通道:11通道
水平扫描:540度,垂直扫描270度
高度电子调光,频闪可达1-20次/秒
均匀的RGB混色系统和彩虹效果(可加白色)
光斑角度:15度
包装尺寸:420*330*550mm
净重:10kg 毛重:13kg
TECHNICAL PARAMETER
Voltage:AC100V-240V ,50/60HZ
Power consumption:306W
Light source:108pcs of 1/3W LED
Operation mode master-slave, sound control,
auto movement,DMX512
Control channel:11Hs
Level scanning angle:540 degrees
Vertical scanning angle:270degrees
Quick electronic dimmer, strobe from 1 to 20 times/second
Smooth RGB mixing system &
Rainbow effect(can add white)
Beam angle:15 degrees
Package size :420*330*550mm
N.W:10kg G.W:13kg
0003
技术参数
电压:AC90V-120V,200V-250V,50/60HZ
光束角:10度,15度,25度可选。
控制通道:11通道
预期使用寿命:50000小时
最低的能量消耗。
信号控制:12个标准DMX 12通道控制,独立的主从控制。
频闪:1-18次/秒
LED显示。
内置程序:内置的8个程序能被DMX控制激活。
尺寸:307*354*267mm
净重:8.7kg
符合GB7000.1-2007.GB7000.217-2008及CE标准
TECHNICAL PARAMETER
Power supply:AC100V-120V.200V-250V.50/60Hz
Angle of light beam:10。15。
25。 Are available for choice.
Control channel:11
Service life:50000 hours
The lowest power consumption
Control signal 12 Standard DMX controlling
Channels and ant channels combination
Can be sep up.
Independent master/slave control
Strobe:1-18 flash per second
Inside program: the 8 inside program can
be activated by DMX controller
Dimensions:307*354*267mm
N.W:8.7kg
Up to CE standard. UL standard and
GB 7000.15-2000standard
Any ideas of which best way to split it and put in to a database ?
Thanks
Lee
1. mb_split is for multibyte strings not preg_split
Use mb_split()
(linked to man page):
$descriptions = mb_split("/\d{4}/", $text);
2. Loop through the file
Another method of attack that possibly avoids non-multibyte safe PHP functions being run on the text and mucking up the Chinese portions:
$file = file('/file/path');
$descriptions = array();
$description_counter = 0;
foreach($file as $line) {
$line = trim($line);
if(preg_match("/^\d{4}$/", $line)) {
$description_counter++;
}
$descriptions[$description_counter] .= $line . "\n";
}
print_r($descriptions);
Copy text in $text and use
$r = preg_split("(\n\d{4}\n)", $text);
精彩评论