Today, I will explain about “How to get position and values separated by delimiter in a string”. Delimiter may be comma,pipe,colon etc.
STEP1 : Get position of all delimiters in a string. I am explaining with SQL-server as Database.
;with T as (
select 0 as row_id, charindex(‘,’, @string) pos
union all
select pos + 1, charindex(‘,’, @string, pos + 1)
from T
where pos > 0)
select * from T
For example , consider a string “Harsimran,Kaur,91-123-456-7890” . So, it has first name and last name and dummy mobile number. If you execute the above Query for this string, you will get below result.
STEP2 : Store this data in temp table( say @temp_table ) .
STEP3: Calculate the minimum value of position field using below command and store in a variable @min_position and set @var_Location=1
SELECT @min_position =min(pos_id) from @temp_table
STEP4: Read the records present in the temp table one by one using cursor.
STEP5: Apply the below “IF BLOCK” code to get all values separated by a delimiter in a particular string.
IF @pos_id=@min_position
BEGIN
INSERT INTO @temp1
select @var_Location,SUBSTRING(@string,@row_id,len(@String))
END
ELSE
BEGIN
INSERT INTO @temp1
select @var_Location,SUBSTRING(@string,@row_id,@pos_id-@row_id)
END
STEP6: Increment the @Var_Location by one using below command.
set @varComma_Location=@varComma_Location+1
Now, In hive, we have a function call locate which will work same as charindex.
For Example , to get position of delimiter for first occurrence, use the below code.
select locate(‘|’,’123|1234|12345|123456′)
Same way to fetch 2nd and 3rd Occurrence position. Use the below code.
select locate(‘|’,’123|1234|12345|123456′,locate(‘|’,’123|1234|12345|123456′)+1)
select locate(‘|’,’123|1234|12345|123456′,locate(‘|’,’123|1234|12345|123456′,locate(‘|’,’123|1234|123456′)+1)+1)
Once you have position value of all occurrences , you can use substring function to get those values as well.