批量导入大容量txt的问题
现有一批TXT文件(每天一个),每个文件大概800M,50W行,并且没有准确分隔符
如:
123sdf980werwer498798810290fwer090asdas090900aaadsdas
456asdasadasdasd79879a8s7d9as7d9879s879a7sd98a7s9d7as
789dsda79sd79a8s7d9a8s7d9a8s7d9a8s7d9a8s7d98as79d87a9
。。。。。。
有用的字段:
123 0290 090900
456 s7d9 a7sd98
789 d9a8 s7d98a
。。。。。。
不想将那些无用字段也导入数据库中,请教该如何实现?怎么做最快捷?
[解决办法]
use Tempdbgo--> --> if not object_id(N'Tempdb..#T') is null drop table #TGoCreate table #T([Col] nvarchar(100))Insert #Tselect N'123sdf980werwer498798810290fwer090asdas090900aaadsdas' union allselect N'456asdasadasdasd79879a8s7d9as7d9879s879a7sd98a7s9d7as' union allselect N'789dsda79sd79a8s7d9a8s7d9a8s7d9a8s7d9a8s7d98as79d87a9'GoSelect [Col1]=LEFT(Col,3),Col2=SUBSTRING(Col,24,4) ,Col3=SUBSTRING(Col,40,6)from #T/*Col1 Col2 Col3123 0290 090900456 s7d9 a7sd98789 d9a8 s7d98a*/--txt讀取select [Col1]=LEFT(Col,3),Col2=SUBSTRING(Col,24,4),Col3=SUBSTRING(Col,40,6) from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=E:\;','select * from roy.txt')
[解决办法]
示例:G:\data.txt文件内容如下:AAAAAABBBBBCCCDD12254156123245124564q21a3sdfdafe545f121a7t9d5f65创建一个表:CREATE TABLE tb(col1 VARCHAR(6),col2 VARCHAR(5),col3 VARCHAR(3),col4 VARCHAR(4));生成xml格式化文件!!bcp MyTest.dbo.tb format nul -f G:\tb_fmt.xml -c -x -Smyfend\liangck -TG:\tb_fmt.xml格式化文件内容如下(要适当修改):<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharFixed" LENGTH="6" COLLATION="Chinese_PRC_90_CI_AS"/> <FIELD ID="2" xsi:type="CharFixed" LENGTH="5" COLLATION="Chinese_PRC_90_CI_AS"/> <FIELD ID="3" xsi:type="CharFixed" LENGTH="3" COLLATION="Chinese_PRC_90_CI_AS"/> <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="2" COLLATION="Chinese_PRC_90_CI_AS"/> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="col1" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="2" NAME="col2" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="3" NAME="col3" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="4" NAME="col4" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>使用BULK INSERT导入数据BULK INSERT tb FROM 'G:\data.txt'WITH( FORMATFILE='G:\tb_fmt.xml');查看数据:SELECT * FROM tb;/*col1 col2 col3 col4------ ----- ---- ----AAAAAA BBBBB CCC DD122541 56123 245 124564q2 1a3sd fda fe545f12 1a7t9 d5f 65(4 行受影响)*/