Loading and QC'ing LAS files in R
LAS files
The LAS format is widely used in the Oil and Gas industry and is short from Log ASCII Standard, in this notebook we show a short workflow to load and qc multiple files. I´m using some files donated by Geolink to the geoscience community.
The libraries
I will use to libraries:
the tidyverse to perform data wrangling and plotting,
petroreadr from Ravenroadresources to load the LAS files.
skimr, to get an exccelent summary of the data.
This script can be found on my github repo
Loading the files
The library indicates in the documentation a few options, I normally choose:
- load directly to a dataframe, which is my go to mode.\
- set the verbose option to
, this is particularly useful when working with large amount of wells/logs, to have the peace of mind that the machine is working and has not hanged.
pathname <- "../../data/GEOLINK_Lithology and wells NORTH SEA/"
lasfiles <- list.files(pathname)
lasfiles <- lasfiles[grepl(".las", lasfiles)]
df<-read_las(file.path(pathname, lasfiles), verbose = TRUE)$data
## + 34_10-12.las imported as <las> object
## + 34_3-2 S.las imported as <las> object
## + 35_3-2.las imported as <las> object
## + 35_9-2.las imported as <las> object
## + 35_9-9.las imported as <las> object
the loaded data
We have obtained a dataframe with all logs, and a column with the well name that makes our life easier for plotting, filtering, etc.
Table: Data summary
Name | df |
Number of rows | 73611 |
Number of columns | 22 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 21 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
WELL | 0 | 1 | 6 | 8 | 0 | 5 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
DEPT | 0 | 1.00 | 2410.31 | 969.88 | 215.22 | 1711.01 | 2425.34 | 3082.65 | 4407.62 | ▂▆▇▆▃ |
Lithology_geolink | 21036 | 0.71 | 7.07 | 3.81 | 1.00 | 5.00 | 6.00 | 7.00 | 18.00 | ▂▇▁▃▁ |
CALI | 3029 | 0.96 | 11.92 | 2.57 | 2.02 | 8.84 | 12.50 | 13.31 | 26.73 | ▁▅▇▁▁ |
DRHO | 3418 | 0.95 | 0.02 | 0.06 | -2.57 | 0.00 | 0.01 | 0.04 | 1.27 | ▁▁▁▇▁ |
NPHI | 9314 | 0.87 | 0.30 | 0.11 | -0.06 | 0.22 | 0.30 | 0.38 | 0.87 | ▁▇▇▁▁ |
RHOB | 3487 | 0.95 | 2.38 | 0.21 | -1.71 | 2.26 | 2.43 | 2.53 | 3.26 | ▁▁▁▃▇ |
GR | 16 | 1.00 | 70.61 | 27.80 | -197.12 | 48.90 | 70.44 | 90.75 | 866.92 | ▁▇▁▁▁ |
DTC | 9599 | 0.87 | 104.65 | 25.45 | -16.58 | 85.75 | 98.96 | 121.83 | 265.00 | ▁▇▇▁▁ |
RDEP | 394 | 0.99 | 11.36 | 147.15 | 0.34 | 1.34 | 1.90 | 3.51 | 29270.71 | ▇▁▁▁▁ |
SP | 27351 | 0.63 | 83.83 | 46.84 | -279.13 | 51.63 | 88.53 | 112.79 | 178.31 | ▁▁▁▇▇ |
RSHA | 37489 | 0.49 | 7.85 | 43.40 | 0.13 | 1.00 | 2.01 | 3.87 | 1770.00 | ▇▁▁▁▁ |
RMED | 391 | 0.99 | 9.68 | 76.45 | 0.13 | 1.37 | 2.14 | 4.06 | 9700.00 | ▇▁▁▁▁ |
BS | 34035 | 0.54 | 10.94 | 1.93 | 8.50 | 8.50 | 12.25 | 12.25 | 17.50 | ▅▁▇▁▁ |
ROP | 46350 | 0.37 | 93.21 | 92.07 | 5.20 | 47.13 | 79.69 | 125.87 | 1290.77 | ▇▁▁▁▁ |
THOR | 70499 | 0.04 | 11.17 | 4.59 | 1.13 | 7.42 | 11.27 | 14.53 | 33.71 | ▅▇▅▁▁ |
PEF | 47126 | 0.36 | 5.51 | 4.83 | 1.35 | 3.89 | 4.64 | 6.86 | 667.36 | ▇▁▁▁▁ |
URAN | 70499 | 0.04 | 2.26 | 1.44 | -0.54 | 1.57 | 2.26 | 2.80 | 27.93 | ▇▁▁▁▁ |
DTS | 53766 | 0.27 | 172.42 | 34.20 | 87.69 | 141.12 | 173.26 | 201.25 | 270.41 | ▂▇▇▆▁ |
DCAL | 61296 | 0.17 | 0.77 | 1.39 | -4.81 | 0.02 | 0.30 | 0.83 | 7.88 | ▁▇▇▁▁ |
SGR | 67154 | 0.09 | 101.56 | 32.54 | 33.15 | 81.72 | 106.35 | 116.22 | 892.06 | ▇▁▁▁▁ |
RMIC | 71425 | 0.03 | 14.44 | 38.00 | 0.72 | 4.74 | 7.36 | 12.76 | 556.37 | ▇▁▁▁▁ |
Quick view of a log
to check whether a well log is present in all wells, and compare some of their stats, the box plot is a powerful tool.
box_1 <-
df %>%
ggplot(aes(WELL, GR)) +
geom_boxplot() +
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
A quick view shows some negative values in well 34/10-12 and some potential outliers in well 35/9-2, we can repeat the plot cropping on a range to have a better view.
We can repeat the same for all logs at once to get a quick idea , however there are at least two points to consider when doing this:
the size of the plot will need to be increased to properly show all logs,
if you want to visualize a zoom of a particular log, you will need to apply a filter to the data before appliying the pivot.
similarly, when dealing with logs that behave logarithmicaly, I dont´t know how to set a logarithmic scale to a particular log using faceting, therefore I apply a log transform before the pivoting step.
df %>%
mutate(log10_RDEP= log10(RDEP)) %>%
select(-RDEP) %>%
filter( GR > 0 ,
GR < 200 ) %>%
pivot_longer(!c("WELL"), names_to = "logs", values_to = "value") %>%
drop_na(value) %>%
geom_boxplot(aes(WELL, value))+
facet_wrap(~logs, scales = "free", ncol = 1) +
Density neutron plot
Just a quick plot set creating a color scale between 0 and 100, the values below 0 are filtered from the plot, but values above 100 are colored as 100.
df %>%
filter(GR>0) %>%
ggplot(aes(RHOB,NPHI, color = GR))+
geom_point( size = 1)+
facet_wrap(~WELL, ncol = 2)+
scale_color_gradient(low = "yellow", high = "brown",limits = c(0,100),
oob = scales::squish