IMPORT Statement
Overview
The IMPORT statement allows you to import data from external sources into your graph database. It supports importing both vertices (nodes) and edges (relationships) from CSV files stored in S3 or OSS storage.
The IMPORT statement has two main variants:
- IMPORT VERTEX: For importing vertex/node data
- IMPORT EDGE: For importing edge/relationship data
Data Sources
Currently supported data sources:
- Amazon S3
- Alibaba Cloud OSS (Object Storage Service)
File Formats
Currently supported file format:
- CSV (Comma-Separated Values)
Import Process
- Create an import job using IMPORT VERTEX or IMPORT EDGE statement
- Check the job status using CHECK JOB command
- Data will be imported according to specified handling options
IMPORT VERTEX
Syntax
IMPORT VERTEX <vertex_table_name>
COLUMNS (
<vertex_property_name> = $<file_column_index>,
<vertex_property_name> = $<file_column_index>,
...
)
FROM <import_file_source_uri>[,<import_file_source_uri>]+ <sourceOptions>
FORMAT AS CSV <fileFormatOptions>
[ <importOptions> ]
sourceOptions
For S3:
WITH (
REGION = <region>,
ACCESS_KEY_ID = <access_key_id>,
SECRET_ACCESS_KEY = <secret_access_key>
)
For Oss :
WITH (
REGION = <region>,
ACCESS_KEY_ID = <access_key_id>,
SECRET_ACCESS_KEY = <secret_access_key>,
ENDPOINT = <endpoint>
)
fileFormatOptions
(
<key> = <value>,
<key> = <value>,
...
)
For csv file format, user can specify the following options:
Key | Description | Value Type | Default Value |
---|---|---|---|
has_header | Whether the file as header line | Boolean | false |
delimiter | Delimiter to separate the columns | String | "," |
null_value | Recognized spellings for null values. | String | "" |
importOptions
PROPERTIES (
<key> = <value>,
<key> = <value>,
...
)
Key | Description | Value Type | Default Value |
---|---|---|---|
duplicate_vertex_handling | When importing encountered duplicated vertex, how to handle it. | "fail" : Fail the job. "overwrite" : Overwrite the duplicated vertex value. "ignore" : Ignore the vertex. | "ignore" |
log_problematic_lines | Whether to log problematic lines. | Boolean | false |
format_error_handling | How to handle bad format lines. | "fail" : Fail the job. "ignore" : Skip the error line. | "ignore" |
Example
IMPORT VERTEX Person COLUMNS("col1" = $0, "col2" = $1, "col3" = $2)
FROM "s3://kasma-fileio-ci/tinysoc/vPerson.csv"
WITH (
region = "xxx",
access_key_id = "xxx",
secret_access_key = "xxx"
)
FORMAT AS CSV (has_header = true, delimiter = ",")
PROPERTIES (
duplicate_vertex_handling = "ignore",
log_problematic_lines = true,
format_error_handling = "ignore"
)
Example of importing from oss
IMPORT VERTEX Person COLUMNS("col1" = $0, "col2" = $1, "col3" = $2)
FROM "oss://kasma-fileio-ci/tinysoc/vPerson.csv"
WITH (
region = "xxx",
access_key_id = "xxx",
secret_access_ke = "xxx",
endpoint = "https://oss-cn-hongkong.aliyuncs.com"
)
FORMAT AS CSV (has_header = true, delimiter = "," )
IMPORT EDGE
Syntax
IMPORT EDGE <edge_table_name>
FROM (
<source_vertex_primary_key_name> = $<file_column_index>,
<source_vertex_primary_key_name> = $<file_column_index>,
...
)
TO (
<target_vertex_primary_key_name> = $<file_column_index>,
<target_vertex_primary_key_name> = $<file_column_index>,
...
)
COLUMNS (
<edge_property_name> = $<file_column_index>,
<edge_property_name> = $<file_column_index>,
...
)
FROM <import_file_source_uri>[,<import_file_source_uri>]+ <sourceOptions>
FORMAT AS CSV <fileFormatOptions>
[ <importOptions> ]
sourceOptions
Same as IMPORT VERTEX.
importOptions
PROPERTIES (
<key> = <value>,
<key> = <value>,
...
)
Key | Description | Value Type | Default Value |
---|---|---|---|
incident_vertex_not_exists_handling | How to handle cases where the edge endpoint vertex does not exist. | "fail" : Fail the job. "ignore" : Ignore the edge. | "ignore" |
log_problematic_lines | Whether to log problematic lines. | Boolean | false |
format_error_handling | How to handle bad format lines. | "fail" : Fail the job. "ignore" : Skip the error line. | "ignore" |
Example
IMPORT EDGE Knows FROM ("col1" = $0) TO ("col1" = $1)
COLUMNS("col1" = $2)
FROM "s3://kasma-fileio-ci/tinysoc/eKnows.csv"
WITH (region = "xxx", access_key_id = "xxx", secret_access_key = "xxx" )
FORMAT AS CSV (has_header = true, delimiter = "," )
PROPERTIES (
incident_vertex_not_exists_handling = "fail",
log_problematic_lines = true,
format_error_handling = "ignore"
)
Check Import Job Staus
IMPORT statement will return an job_id, which can be used to check the import status.
CHECK JOB <job_id>
The results will be a json object with a required kind
field.
The kind
field in the json object means the job status. The possible values are:
Running
: The job is running, and the status object will contains no other fields.Finished
: The job is finished successfully. The status object will contains the following fields:load_data_rows
: The number of rows loaded.load_data_bytes
: The number of bytes loaded.job_start_ts
: The start timestamp of the job.job_end_ts
: The end timestamp of the job.warning
: The warning message.
Failed
: The job is failed. The status object will contains the following fields:reason
: The error reason.
Cancelled
: The job is cancelled.Pending
: The job is pending, waiting to be scheduled.
Example
Job finished successfully.
{
"kind" : "Finished",
"warning": null,
"load_data_rows": 8,
"load_data_bytes": 2368,
"job_start_ts":1756973976062646,
"job_end_ts":1756973976841641
}
BLOCKING IMPORT
Sometimes we only want to submit the job and waiting the job finish and return the result. We can add the BLOCKING
keyword to the import statement.
BLOCKING IMPORT VERTEX Person COLUMNS("col1" = $0, "col2" = $1, "col3" = $2)
FROM "s3://kasma-fileio-ci/tinysoc/vPerson.csv"
WITH (
region = "xxx",
access_key_id = "xxx",
secret_access_key = "xxx"
)
FORMAT AS CSV (has_header = true, delimiter = ",")
PROPERTIES (
duplicate_vertex_handling = "ignore",
log_problematic_lines = true,
format_error_handling = "ignore"
)
In this way, the import statement will not return until the job is finished.